Picture for Yu Yue

Yu Yue

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Add code
Apr 08, 2025
Viaarxiv icon

A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization

Add code
Apr 07, 2025
Viaarxiv icon

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Add code
Mar 31, 2025
Viaarxiv icon

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Add code
Mar 18, 2025
Viaarxiv icon

What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret

Add code
Mar 03, 2025
Viaarxiv icon

A Survey on Natural Language Counterfactual Generation

Add code
Jul 04, 2024
Figure 1 for A Survey on Natural Language Counterfactual Generation
Figure 2 for A Survey on Natural Language Counterfactual Generation
Figure 3 for A Survey on Natural Language Counterfactual Generation
Figure 4 for A Survey on Natural Language Counterfactual Generation
Viaarxiv icon