Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Suri: Multi-constraint Instruction Following for Long-form Text Generation

Jun 27, 2024

Chau Minh Pham, Simeng Sun, Mohit Iyyer

Figure 1 for Suri: Multi-constraint Instruction Following for Long-form Text Generation

Figure 2 for Suri: Multi-constraint Instruction Following for Long-form Text Generation

Figure 3 for Suri: Multi-constraint Instruction Following for Long-form Text Generation

Figure 4 for Suri: Multi-constraint Instruction Following for Long-form Text Generation

Share this with someone who'll enjoy it:

Abstract:Existing research on instruction following largely focuses on tasks with simple instructions and short responses. In this work, we explore multi-constraint instruction following for generating long-form text. We create Suri, a dataset with 20K human-written long-form texts paired with LLM-generated backtranslated instructions that contain multiple complex constraints. Because of prohibitive challenges associated with collecting human preference judgments on long-form texts, preference-tuning algorithms such as DPO are infeasible in our setting; thus, we propose Instructional ORPO (I-ORPO), an alignment method based on the ORPO algorithm. Instead of receiving negative feedback from dispreferred responses, I-ORPO obtains negative feedback from synthetically corrupted instructions generated by an LLM. Using Suri, we perform supervised and I-ORPO fine-tuning on Mistral-7b-Instruct-v0.2. The resulting models, Suri-SFT and Suri-I-ORPO, generate significantly longer texts (~5K tokens) than base models without significant quality deterioration. Our human evaluation shows that while both SFT and I-ORPO models satisfy most constraints, Suri-I-ORPO generations are generally preferred for their coherent and informative incorporation of the constraints. We release our code at https://github.com/chtmp223/suri.

View paper on

Share this with someone who'll enjoy it:

Title:Suri: Multi-constraint Instruction Following for Long-form Text Generation

Paper and Code