Picture for Ziniu Li

Ziniu Li

Controlling Large Language Model with Latent Actions

Add code
Mar 27, 2025
Viaarxiv icon

RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

Add code
Jan 24, 2025
Viaarxiv icon

Enabling Scalable Oversight via Self-Evolving Critic

Add code
Jan 10, 2025
Viaarxiv icon

Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity

Add code
Aug 29, 2024
Viaarxiv icon

Adam-mini: Use Fewer Learning Rates To Gain More

Add code
Jun 26, 2024
Viaarxiv icon

BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation

Add code
May 27, 2024
Viaarxiv icon

On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization

Add code
May 26, 2024
Viaarxiv icon

Why Transformers Need Adam: A Hessian Perspective

Add code
Feb 26, 2024
Viaarxiv icon

Policy Optimization in RLHF: The Impact of Out-of-preference Data

Add code
Dec 17, 2023
Viaarxiv icon

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

Add code
Oct 17, 2023
Viaarxiv icon