Picture for Xing Huang

Xing Huang

Preference as Reward, Maximum Preference Optimization with Importance Sampling

Add code
Jan 08, 2024
Figure 1 for Preference as Reward, Maximum Preference Optimization with Importance Sampling
Viaarxiv icon