Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:RankGen: Improving Text Generation with Large Ranking Models

May 19, 2022

Kalpesh Krishna, Yapei Chang, John Wieting, Mohit Iyyer

Figure 1 for RankGen: Improving Text Generation with Large Ranking Models

Figure 2 for RankGen: Improving Text Generation with Large Ranking Models

Figure 3 for RankGen: Improving Text Generation with Large Ranking Models

Figure 4 for RankGen: Improving Text Generation with Large Ranking Models

Share this with someone who'll enjoy it:

Abstract:Given an input sequence (or prefix), modern language models often assign high probabilities to output sequences that are repetitive, incoherent, or irrelevant to the prefix; as such, model-generated text also contains such artifacts. To address these issues, we present RankGen, an encoder model (1.2B parameters) that scores model generations given a prefix. RankGen can be flexibly incorporated as a scoring function in beam search and used to decode from any pretrained language model. We train RankGen using large-scale contrastive learning to map a prefix close to the ground-truth sequence that follows it and far away from two types of negatives: (1) random sequences from the same document as the prefix, and, which discourage topically-similar but irrelevant generations; (2) sequences generated from a large language model conditioned on the prefix, which discourage repetition and hallucination. Experiments across four different language models (345M-11B parameters) and two domains show that RankGen significantly outperforms decoding algorithms like nucleus, top-k, and typical sampling on both automatic metrics (85.0 vs 77.3 MAUVE) as well as human evaluations with English writers (74.5% human preference over nucleus sampling). Analysis reveals that RankGen outputs are more relevant to the prefix and improve continuity and coherence compared to baselines. We open source our model checkpoints, code, and human preferences with detailed explanations for future research.

* Preprint (34 pages), code and pretrained model checkpoints will be provided at https://github.com/martiansideofthemoon/rankgen

View paper on

Share this with someone who'll enjoy it:

Title:RankGen: Improving Text Generation with Large Ranking Models

Paper and Code