Picture for Hongning Wang

Hongning Wang

IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation

Add code
Mar 05, 2026
Viaarxiv icon

RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis

Add code
Feb 28, 2026
Viaarxiv icon

RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models

Add code
Feb 28, 2026
Viaarxiv icon

Grounding LLMs in Scientific Discovery via Embodied Actions

Add code
Feb 24, 2026
Viaarxiv icon

GLM-5: from Vibe Coding to Agentic Engineering

Add code
Feb 17, 2026
Viaarxiv icon

Reasoning to Rank: An End-to-End Solution for Exploiting Large Language Models for Recommendation

Add code
Feb 13, 2026
Viaarxiv icon

The Missing Half: Unveiling Training-time Implicit Safety Risks Beyond Deployment

Add code
Feb 04, 2026
Viaarxiv icon

Trust-Region Adaptive Policy Optimization

Add code
Dec 19, 2025
Figure 1 for Trust-Region Adaptive Policy Optimization
Figure 2 for Trust-Region Adaptive Policy Optimization
Figure 3 for Trust-Region Adaptive Policy Optimization
Figure 4 for Trust-Region Adaptive Policy Optimization
Viaarxiv icon

Data-Efficient RLVR via Off-Policy Influence Guidance

Add code
Oct 30, 2025
Viaarxiv icon

Think Socially via Cognitive Reasoning

Add code
Sep 26, 2025
Viaarxiv icon