Picture for Ruofei Zhu

Ruofei Zhu

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Add code
Apr 08, 2025
Viaarxiv icon

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Add code
Mar 31, 2025
Viaarxiv icon

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Add code
Mar 18, 2025
Viaarxiv icon

What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret

Add code
Mar 03, 2025
Viaarxiv icon