Picture for Yirong Sun

Yirong Sun

The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models

Add code
Oct 09, 2024
Viaarxiv icon