Picture for Zhiyu Mei

Zhiyu Mei

On Designing Effective RL Reward at Training Time for LLM Reasoning

Add code
Oct 19, 2024
Figure 1 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 2 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 3 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 4 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Viaarxiv icon

ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation

Add code
Jun 20, 2024
Viaarxiv icon

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Add code
Apr 16, 2024
Viaarxiv icon

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

Add code
Jul 05, 2023
Viaarxiv icon