Picture for Avital Zipori

Avital Zipori

Multi-turn Reinforcement Learning from Preference Human Feedback

Add code
May 23, 2024
Viaarxiv icon