Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jakub Adamek

Small Language Models Improve Giants by Rewriting Their Outputs

May 22, 2023

Giorgos Vernikos, Arthur Bražinskas, Jakub Adamek, Jonathan Mallinson, Aliaksei Severyn, Eric Malmi

Figure 1 for Small Language Models Improve Giants by Rewriting Their Outputs

Figure 2 for Small Language Models Improve Giants by Rewriting Their Outputs

Figure 3 for Small Language Models Improve Giants by Rewriting Their Outputs

Figure 4 for Small Language Models Improve Giants by Rewriting Their Outputs

Abstract:Large language models (LLMs) have demonstrated impressive few-shot learning capabilities, but they often underperform compared to fine-tuned models on challenging tasks. Furthermore, their large size and restricted access only through APIs make task-specific fine-tuning impractical. Moreover, LLMs are sensitive to different aspects of prompts (e.g., the selection and order of demonstrations) and can thus require time-consuming prompt engineering. In this light, we propose a method to correct LLM outputs without relying on their weights. First, we generate a pool of candidates by few-shot prompting an LLM. Second, we refine the LLM-generated outputs using a smaller model, the LM-corrector (LMCor), which is trained to rank, combine and rewrite the candidates to produce the final target output. Our experiments demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B) across diverse tasks. Moreover, we illustrate that the LMCor exhibits robustness against different prompts, thereby minimizing the need for extensive prompt engineering. Finally, we showcase that the LMCor can be seamlessly integrated with different LLMs at inference time, serving as a plug-and-play module to improve their performance.

Via

Access Paper or Ask Questions

Teaching Small Language Models to Reason

Dec 19, 2022

Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn

Figure 1 for Teaching Small Language Models to Reason

Figure 2 for Teaching Small Language Models to Reason

Figure 3 for Teaching Small Language Models to Reason

Figure 4 for Teaching Small Language Models to Reason

Abstract:Chain of thought prompting successfully improves the reasoning capabilities of large language models, achieving state of the art results on a range of datasets. However, these reasoning capabilities only appear to emerge in models with a size of over 100 billion parameters. In this paper, we explore the transfer of such reasoning capabilities to models with less than 100 billion parameters via knowledge distillation. Specifically, we finetune a student model on the chain of thought outputs generated by a larger teacher model. Our experiments show that the proposed method improves task performance across arithmetic, commonsense and symbolic reasoning datasets. For example, the accuracy of T5 XXL on GSM8K improves from 8.11% to 21.99% when finetuned on PaLM-540B generated chains of thought.

Via

Access Paper or Ask Questions

Text Generation with Text-Editing Models

Jun 14, 2022

Eric Malmi, Yue Dong, Jonathan Mallinson, Aleksandr Chuklin, Jakub Adamek, Daniil Mirylenka, Felix Stahlberg, Sebastian Krause, Shankar Kumar, Aliaksei Severyn

Figure 1 for Text Generation with Text-Editing Models

Figure 2 for Text Generation with Text-Editing Models

Figure 3 for Text Generation with Text-Editing Models

Figure 4 for Text Generation with Text-Editing Models

Abstract:Text-editing models have recently become a prominent alternative to seq2seq models for monolingual text-generation tasks such as grammatical error correction, simplification, and style transfer. These tasks share a common trait - they exhibit a large amount of textual overlap between the source and target texts. Text-editing models take advantage of this observation and learn to generate the output by predicting edit operations applied to the source sequence. In contrast, seq2seq models generate outputs word-by-word from scratch thus making them slow at inference time. Text-editing models provide several benefits over seq2seq models including faster inference speed, higher sample efficiency, and better control and interpretability of the outputs. This tutorial provides a comprehensive overview of text-editing models and current state-of-the-art approaches, and analyzes their pros and cons. We discuss challenges related to productionization and how these models can be used to mitigate hallucination and bias, both pressing challenges in the field of text generation.

* Accepted as a tutorial at NAACL 2022

Via

Access Paper or Ask Questions

EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

May 24, 2022

Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn

Figure 1 for EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

Figure 2 for EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

Figure 3 for EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

Figure 4 for EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

Abstract:We present EdiT5 - a novel semi-autoregressive text-editing approach designed to combine the strengths of non-autoregressive text-editing and autoregressive decoding. EdiT5 is faster at inference times than conventional sequence-to-sequence (seq2seq) models, while being capable of modeling flexible input-output transformations. This is achieved by decomposing the generation process into three sub-tasks: (1) tagging to decide on the subset of input tokens to be preserved in the output, (2) re-ordering to define their order in the output text, and (3) insertion to infill the missing tokens that are not present in the input. The tagging and re-ordering steps, which are responsible for generating the largest portion of the output, are non-autoregressive, while the insertion uses an autoregressive decoder. Depending on the task, EdiT5 requires significantly fewer autoregressive steps demonstrating speedups of up to 25x when compared to classic seq2seq models. Quality-wise, EdiT5 is initialized with a pre-trained T5 checkpoint yielding comparable performance to T5 in high-resource settings and clearly outperforms it on low-resource settings when evaluated on three NLG tasks: Sentence Fusion, Grammatical Error Correction, and Decontextualization.

Via

Access Paper or Ask Questions

Stepwise Extractive Summarization and Planning with Structured Transformers

Oct 06, 2020

Shashi Narayan, Joshua Maynez, Jakub Adamek, Daniele Pighin, Blaž Bratanič, Ryan McDonald

Figure 1 for Stepwise Extractive Summarization and Planning with Structured Transformers

Figure 2 for Stepwise Extractive Summarization and Planning with Structured Transformers

Figure 3 for Stepwise Extractive Summarization and Planning with Structured Transformers

Figure 4 for Stepwise Extractive Summarization and Planning with Structured Transformers

Abstract:We propose encoder-centric stepwise models for extractive summarization using structured transformers -- HiBERT and Extended Transformers. We enable stepwise summarization by injecting the previously generated summary into the structured transformer as an auxiliary sub-structure. Our models are not only efficient in modeling the structure of long inputs, but they also do not rely on task-specific redundancy-aware modeling, making them a general purpose extractive content planner for different tasks. When evaluated on CNN/DailyMail extractive summarization, stepwise models achieve state-of-the-art performance in terms of Rouge without any redundancy aware modeling or sentence filtering. This also holds true for Rotowire table-to-text generation, where our models surpass previously reported metrics for content selection, planning and ordering, highlighting the strength of stepwise modeling. Amongst the two structured transformers we test, stepwise Extended Transformers provides the best performance across both datasets and sets a new standard for these challenges.

* 17 pages, EMNLP 2020

Via

Access Paper or Ask Questions