Picture for Pete Shaw

Pete Shaw

Robust Preference Optimization through Reward Model Distillation

Add code
May 29, 2024
Viaarxiv icon