Picture for Hongning Wang

Hongning Wang

IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation

Add code
Mar 05, 2026
Viaarxiv icon

RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models

Add code
Feb 28, 2026
Viaarxiv icon

RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis

Add code
Feb 28, 2026
Viaarxiv icon

Grounding LLMs in Scientific Discovery via Embodied Actions

Add code
Feb 24, 2026
Viaarxiv icon

GLM-5: from Vibe Coding to Agentic Engineering

Add code
Feb 17, 2026
Viaarxiv icon

Reasoning to Rank: An End-to-End Solution for Exploiting Large Language Models for Recommendation

Add code
Feb 13, 2026
Viaarxiv icon

The Missing Half: Unveiling Training-time Implicit Safety Risks Beyond Deployment

Add code
Feb 04, 2026
Viaarxiv icon

Trust-Region Adaptive Policy Optimization

Add code
Dec 19, 2025
Figure 1 for Trust-Region Adaptive Policy Optimization
Figure 2 for Trust-Region Adaptive Policy Optimization
Figure 3 for Trust-Region Adaptive Policy Optimization
Figure 4 for Trust-Region Adaptive Policy Optimization
Viaarxiv icon

Data-Efficient RLVR via Off-Policy Influence Guidance

Add code
Oct 30, 2025
Viaarxiv icon

Think Socially via Cognitive Reasoning

Add code
Sep 26, 2025
Viaarxiv icon