Picture for Xing Huang

Xing Huang

Preference as Reward, Maximum Preference Optimization with Importance Sampling

Add code
Jan 08, 2024
Viaarxiv icon