Picture for Fangyuan Li

Fangyuan Li

Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing

Add code
Jun 08, 2024
Viaarxiv icon