Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:AIPO: Improving Training Objective for Iterative Preference Optimization

Sep 13, 2024

Yaojie Shen, Xinyao Wang, Yulei Niu, Ying Zhou, Lexin Tang, Libo Zhang, Fan Chen, Longyin Wen

Figure 1 for AIPO: Improving Training Objective for Iterative Preference Optimization

Figure 2 for AIPO: Improving Training Objective for Iterative Preference Optimization

Figure 3 for AIPO: Improving Training Objective for Iterative Preference Optimization

Figure 4 for AIPO: Improving Training Objective for Iterative Preference Optimization

Share this with someone who'll enjoy it:

Abstract:Preference Optimization (PO), is gaining popularity as an alternative choice of Proximal Policy Optimization (PPO) for aligning Large Language Models (LLMs). Recent research on aligning LLMs iteratively with synthetic or partially synthetic data shows promising results in scaling up PO training for both academic settings and proprietary trained models such as Llama3. Despite its success, our study shows that the length exploitation issue present in PO is even more severe in Iterative Preference Optimization (IPO) due to the iterative nature of the process. In this work, we study iterative preference optimization with synthetic data. We share the findings and analysis along the way of building the iterative preference optimization pipeline. More specifically, we discuss the length exploitation issue during iterative preference optimization and propose our training objective for iterative preference optimization, namely Agreement-aware Iterative Preference Optimization (AIPO). To demonstrate the effectiveness of our method, we conduct comprehensive experiments and achieve state-of-the-art performance on MT-Bench, AlpacaEval 2.0, and Arena-Hard. Our implementation and model checkpoints will be made available at https://github.com/bytedance/AIPO.

View paper on

Share this with someone who'll enjoy it:

Title:AIPO: Improving Training Objective for Iterative Preference Optimization

Paper and Code