Picture for Weilin Liu

Weilin Liu

On Designing Effective RL Reward at Training Time for LLM Reasoning

Add code
Oct 19, 2024
Figure 1 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 2 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 3 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 4 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Viaarxiv icon

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Add code
Apr 16, 2024
Viaarxiv icon

Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased

Add code
Feb 03, 2023
Figure 1 for Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased
Figure 2 for Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased
Figure 3 for Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased
Figure 4 for Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased
Viaarxiv icon

Multi-Agent Vulnerability Discovery for Autonomous Driving with Hazard Arbitration Reward

Add code
Dec 12, 2021
Figure 1 for Multi-Agent Vulnerability Discovery for Autonomous Driving with Hazard Arbitration Reward
Figure 2 for Multi-Agent Vulnerability Discovery for Autonomous Driving with Hazard Arbitration Reward
Figure 3 for Multi-Agent Vulnerability Discovery for Autonomous Driving with Hazard Arbitration Reward
Figure 4 for Multi-Agent Vulnerability Discovery for Autonomous Driving with Hazard Arbitration Reward
Viaarxiv icon