Picture for Wenhao Zhan

Wenhao Zhan

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Add code
Oct 06, 2024
Viaarxiv icon

Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank

Add code
Oct 01, 2024
Viaarxiv icon

Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization

Add code
Jul 18, 2024
Figure 1 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 2 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 3 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Viaarxiv icon

REBEL: Reinforcement Learning via Regressing Relative Rewards

Add code
Apr 25, 2024
Viaarxiv icon

Dataset Reset Policy Optimization for RLHF

Add code
Apr 15, 2024
Viaarxiv icon

Optimal Multi-Distribution Learning

Add code
Dec 08, 2023
Viaarxiv icon

Provably Efficient CVaR RL in Low-rank MDPs

Add code
Nov 20, 2023
Viaarxiv icon

How to Query Human Feedback Efficiently in RL?

Add code
May 29, 2023
Viaarxiv icon

Provable Offline Reinforcement Learning with Human Feedback

Add code
May 24, 2023
Viaarxiv icon

Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning

Add code
May 17, 2023
Viaarxiv icon