Picture for Ziniu Li

Ziniu Li

Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity

Add code
Aug 29, 2024
Viaarxiv icon

Adam-mini: Use Fewer Learning Rates To Gain More

Add code
Jun 26, 2024
Viaarxiv icon

BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation

Add code
May 27, 2024
Viaarxiv icon

On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization

Add code
May 26, 2024
Viaarxiv icon

Why Transformers Need Adam: A Hessian Perspective

Add code
Feb 26, 2024
Viaarxiv icon

Policy Optimization in RLHF: The Impact of Out-of-preference Data

Add code
Dec 17, 2023
Viaarxiv icon

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

Add code
Oct 17, 2023
Viaarxiv icon

Provably Efficient Adversarial Imitation Learning with Unknown Transitions

Add code
Jun 11, 2023
Viaarxiv icon

Deploying Offline Reinforcement Learning with Human Feedback

Add code
Mar 13, 2023
Viaarxiv icon

Theoretical Analysis of Offline Imitation With Supplementary Dataset

Add code
Jan 27, 2023
Viaarxiv icon