Picture for Wei Fu

Wei Fu

On Designing Effective RL Reward at Training Time for LLM Reasoning

Add code
Oct 19, 2024
Viaarxiv icon

ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation

Add code
Jun 20, 2024
Viaarxiv icon

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Add code
Apr 16, 2024
Viaarxiv icon

Learning Agile Bipedal Motions on a Quadrupedal Robot

Add code
Nov 10, 2023
Viaarxiv icon

Iteratively Learn Diverse Strategies with State Distance Information

Add code
Oct 23, 2023
Figure 1 for Iteratively Learn Diverse Strategies with State Distance Information
Figure 2 for Iteratively Learn Diverse Strategies with State Distance Information
Figure 3 for Iteratively Learn Diverse Strategies with State Distance Information
Figure 4 for Iteratively Learn Diverse Strategies with State Distance Information
Viaarxiv icon

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

Add code
Jul 05, 2023
Viaarxiv icon

Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

Add code
Jun 15, 2022
Figure 1 for Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Figure 2 for Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Figure 3 for Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Figure 4 for Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Viaarxiv icon

Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization

Add code
Apr 04, 2022
Figure 1 for Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Figure 2 for Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Figure 3 for Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Figure 4 for Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Viaarxiv icon

How to "DODGE" Complex Software Analytics?

Add code
Feb 05, 2019
Figure 1 for How to "DODGE" Complex Software Analytics?
Figure 2 for How to "DODGE" Complex Software Analytics?
Figure 3 for How to "DODGE" Complex Software Analytics?
Figure 4 for How to "DODGE" Complex Software Analytics?
Viaarxiv icon

What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

Add code
Feb 20, 2018
Figure 1 for What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)
Figure 2 for What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)
Figure 3 for What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)
Figure 4 for What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)
Viaarxiv icon