Picture for Yafu Li

Yafu Li

ExGRPO: Learning to Reason from Experience

Add code
Oct 02, 2025
Viaarxiv icon

Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration

Add code
Sep 18, 2025
Viaarxiv icon

A Survey of Reinforcement Learning for Large Reasoning Models

Add code
Sep 10, 2025
Viaarxiv icon

Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning

Add code
Sep 04, 2025
Viaarxiv icon

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

Add code
Jul 24, 2025
Viaarxiv icon

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Add code
Jun 04, 2025
Figure 1 for Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Figure 2 for Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Figure 3 for Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Figure 4 for Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Viaarxiv icon

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Add code
May 20, 2025
Viaarxiv icon

Learning to Reason under Off-Policy Guidance

Add code
Apr 22, 2025
Viaarxiv icon

SEE: Continual Fine-tuning with Sequential Ensemble of Experts

Add code
Apr 09, 2025
Viaarxiv icon

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Add code
Mar 27, 2025
Viaarxiv icon