Picture for Wenjie Ye

Wenjie Ye

On Designing Effective RL Reward at Training Time for LLM Reasoning

Add code
Oct 19, 2024
Figure 1 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 2 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 3 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 4 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Viaarxiv icon

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Add code
Apr 16, 2024
Viaarxiv icon

Towards Blockchain-Assisted Privacy-Aware Data Sharing For Edge Intelligence: A Smart Healthcare Perspective

Add code
Jun 29, 2023
Viaarxiv icon

What Happened 3 Seconds Ago? Inferring the Past with Thermal Imaging

Add code
Apr 26, 2023
Viaarxiv icon