Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Self-Boosting Large Language Models with Synthetic Preference Data

Oct 09, 2024

Qingxiu Dong, Li Dong, Xingxing Zhang, Zhifang Sui, Furu Wei

Figure 1 for Self-Boosting Large Language Models with Synthetic Preference Data

Figure 2 for Self-Boosting Large Language Models with Synthetic Preference Data

Figure 3 for Self-Boosting Large Language Models with Synthetic Preference Data

Figure 4 for Self-Boosting Large Language Models with Synthetic Preference Data

Share this with someone who'll enjoy it:

Abstract:Through alignment with human preferences, Large Language Models (LLMs) have advanced significantly in generating honest, harmless, and helpful responses. However, collecting high-quality preference data is a resource-intensive and creativity-demanding process, especially for the continual improvement of LLMs. We introduce SynPO, a self-boosting paradigm that leverages synthetic preference data for model alignment. SynPO employs an iterative mechanism wherein a self-prompt generator creates diverse prompts, and a response improver refines model responses progressively. This approach trains LLMs to autonomously learn the generative rewards for their own outputs and eliminates the need for large-scale annotation of prompts and human preferences. After four SynPO iterations, Llama3-8B and Mistral-7B show significant enhancements in instruction-following abilities, achieving over 22.1% win rate improvements on AlpacaEval 2.0 and ArenaHard. Simultaneously, SynPO improves the general performance of LLMs on various tasks, validated by a 3.2 to 5.0 average score increase on the well-recognized Open LLM leaderboard.

View paper on

Share this with someone who'll enjoy it:

Title:Self-Boosting Large Language Models with Synthetic Preference Data

Paper and Code