Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

Nov 09, 2024

Xinghua Zhang, Haiyang Yu, Cheng Fu, Fei Huang, Yongbin Li

Figure 1 for IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

Figure 2 for IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

Figure 3 for IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

Figure 4 for IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

Share this with someone who'll enjoy it:

Abstract:In the realm of large language models (LLMs), the ability of models to accurately follow instructions is paramount as more agents and applications leverage LLMs for construction, where the complexity of instructions are rapidly increasing. However, on the one hand, there is only a certain amount of complex instruction evaluation data; on the other hand, there are no dedicated algorithms to improve the ability to follow complex instructions. To this end, this paper introduces TRACE, a benchmark for improving and evaluating the complex instructionfollowing ability, which consists of 120K training data and 1K evaluation data. Furthermore, we propose IOPO (Input-Output Preference Optimization) alignment method which takes both input and output preference pairs into consideration, where LLMs not only rapidly align with response preferences but also meticulously explore the instruction preferences. Extensive experiments on both in-domain and outof-domain datasets confirm the effectiveness of IOPO, showing 8.15%, 2.18% improvements on in-domain data and 6.29%, 3.13% on outof-domain data compared to SFT and DPO respectively.

* Work in progress

View paper on

Share this with someone who'll enjoy it:

Title:IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

Paper and Code