Picture for Han Zhong

Han Zhong

A3S: A General Active Clustering Method with Pairwise Constraints

Add code
Jul 14, 2024
Viaarxiv icon

Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond

Add code
Jun 03, 2024
Viaarxiv icon

DPO Meets PPO: Reinforced Token Optimization for RLHF

Add code
Apr 29, 2024
Figure 1 for DPO Meets PPO: Reinforced Token Optimization for RLHF
Figure 2 for DPO Meets PPO: Reinforced Token Optimization for RLHF
Figure 3 for DPO Meets PPO: Reinforced Token Optimization for RLHF
Figure 4 for DPO Meets PPO: Reinforced Token Optimization for RLHF
Viaarxiv icon

Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation

Add code
Apr 19, 2024
Viaarxiv icon

Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm

Add code
Apr 04, 2024
Viaarxiv icon

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Add code
Feb 25, 2024
Viaarxiv icon

Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity

Add code
Dec 28, 2023
Viaarxiv icon

Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF

Add code
Dec 18, 2023
Viaarxiv icon

Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation

Add code
Dec 07, 2023
Viaarxiv icon

Posterior Sampling for Competitive RL: Function Approximation and Partial Observation

Add code
Oct 30, 2023
Viaarxiv icon