Picture for Sambit Sahu

Sambit Sahu

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Add code
Oct 05, 2024
Viaarxiv icon

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Add code
Sep 17, 2024
Viaarxiv icon