Picture for Xuerui Su

Xuerui Su

Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning

Add code
Apr 06, 2025
Viaarxiv icon

Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms

Add code
Feb 05, 2025
Viaarxiv icon