Picture for Runlong Zhou

Runlong Zhou

The Crucial Role of Samplers in Online Direct Preference Optimization

Add code
Sep 29, 2024
Figure 1 for The Crucial Role of Samplers in Online Direct Preference Optimization
Figure 2 for The Crucial Role of Samplers in Online Direct Preference Optimization
Figure 3 for The Crucial Role of Samplers in Online Direct Preference Optimization
Figure 4 for The Crucial Role of Samplers in Online Direct Preference Optimization
Viaarxiv icon

Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques

Add code
Sep 04, 2024
Figure 1 for Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Figure 2 for Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Figure 3 for Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Figure 4 for Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Viaarxiv icon

Reflect-RL: Two-Player Online RL Fine-Tuning for LMs

Add code
Feb 20, 2024
Figure 1 for Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Figure 2 for Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Figure 3 for Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Figure 4 for Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Viaarxiv icon

Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

Add code
Oct 30, 2023
Viaarxiv icon

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

Add code
Jan 31, 2023
Viaarxiv icon

Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems

Add code
Feb 11, 2022
Figure 1 for Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems
Figure 2 for Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems
Figure 3 for Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems
Figure 4 for Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems
Viaarxiv icon

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Add code
Apr 22, 2021
Figure 1 for Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
Figure 2 for Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
Viaarxiv icon