Picture for Haipeng Luo

Haipeng Luo

Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena

Add code
Jul 15, 2024
Viaarxiv icon

Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms

Add code
Jun 15, 2024
Viaarxiv icon

No-Regret Learning for Fair Multi-Agent Social Welfare Optimization

Add code
May 31, 2024
Viaarxiv icon

Provably Efficient Interactive-Grounded Learning with Personalized Reward

Add code
May 31, 2024
Viaarxiv icon

Optimal Multiclass U-Calibration Error and Beyond

Add code
May 28, 2024
Viaarxiv icon

Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback

Add code
May 14, 2024
Viaarxiv icon

Tractable Local Equilibria in Non-Concave Games

Add code
Mar 13, 2024
Viaarxiv icon

Contextual Multinomial Logit Bandits with General Value Functions

Add code
Feb 18, 2024
Viaarxiv icon

Efficient Contextual Bandits with Uninformed Feedback Graphs

Add code
Feb 12, 2024
Viaarxiv icon

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

Add code
Jan 26, 2024
Viaarxiv icon