Picture for Mingyue Huo

Mingyue Huo

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Add code
Jun 30, 2024
Viaarxiv icon