Picture for JinJin Li

JinJin Li

RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models

Add code
Feb 15, 2024
Viaarxiv icon