Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eric Malmi

West-of-N: Synthetic Preference Generation for Improved Reward Modeling

Jan 22, 2024

Alizée Pace, Jonathan Mallinson, Eric Malmi, Sebastian Krause, Aliaksei Severyn

Figure 1 for West-of-N: Synthetic Preference Generation for Improved Reward Modeling

Figure 2 for West-of-N: Synthetic Preference Generation for Improved Reward Modeling

Figure 3 for West-of-N: Synthetic Preference Generation for Improved Reward Modeling

Figure 4 for West-of-N: Synthetic Preference Generation for Improved Reward Modeling

Abstract:The success of reinforcement learning from human feedback (RLHF) in language model alignment is strongly dependent on the quality of the underlying reward model. In this paper, we present a novel approach to improve reward model quality by generating synthetic preference data, thereby augmenting the training dataset with on-policy, high-quality preference pairs. Motivated by the promising results of Best-of-N sampling strategies in language model training, we extend their application to reward model training. This results in a self-training strategy to generate preference pairs by selecting the best and worst candidates in a pool of responses to a given query. Empirically, we find that this approach improves the performance of any reward model, with an effect comparable to the addition of a similar quantity of human preference data. This work opens up new avenues of research for improving RLHF for language model alignment, by offering synthetic preference generation as a solution to reward modeling challenges.

Via

Access Paper or Ask Questions

Gemini: A Family of Highly Capable Multimodal Models

Dec 19, 2023

Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth(+930 more)

Abstract:This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.

Via

Access Paper or Ask Questions

Small Language Models Improve Giants by Rewriting Their Outputs

May 22, 2023

Giorgos Vernikos, Arthur Bražinskas, Jakub Adamek, Jonathan Mallinson, Aliaksei Severyn, Eric Malmi

Figure 1 for Small Language Models Improve Giants by Rewriting Their Outputs

Figure 2 for Small Language Models Improve Giants by Rewriting Their Outputs

Figure 3 for Small Language Models Improve Giants by Rewriting Their Outputs

Figure 4 for Small Language Models Improve Giants by Rewriting Their Outputs

Abstract:Large language models (LLMs) have demonstrated impressive few-shot learning capabilities, but they often underperform compared to fine-tuned models on challenging tasks. Furthermore, their large size and restricted access only through APIs make task-specific fine-tuning impractical. Moreover, LLMs are sensitive to different aspects of prompts (e.g., the selection and order of demonstrations) and can thus require time-consuming prompt engineering. In this light, we propose a method to correct LLM outputs without relying on their weights. First, we generate a pool of candidates by few-shot prompting an LLM. Second, we refine the LLM-generated outputs using a smaller model, the LM-corrector (LMCor), which is trained to rank, combine and rewrite the candidates to produce the final target output. Our experiments demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B) across diverse tasks. Moreover, we illustrate that the LMCor exhibits robustness against different prompts, thereby minimizing the need for extensive prompt engineering. Finally, we showcase that the LMCor can be seamlessly integrated with different LLMs at inference time, serving as a plug-and-play module to improve their performance.

Via

Access Paper or Ask Questions

Teaching Small Language Models to Reason

Dec 19, 2022

Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn

Figure 1 for Teaching Small Language Models to Reason

Figure 2 for Teaching Small Language Models to Reason

Figure 3 for Teaching Small Language Models to Reason

Figure 4 for Teaching Small Language Models to Reason

Abstract:Chain of thought prompting successfully improves the reasoning capabilities of large language models, achieving state of the art results on a range of datasets. However, these reasoning capabilities only appear to emerge in models with a size of over 100 billion parameters. In this paper, we explore the transfer of such reasoning capabilities to models with less than 100 billion parameters via knowledge distillation. Specifically, we finetune a student model on the chain of thought outputs generated by a larger teacher model. Our experiments show that the proposed method improves task performance across arithmetic, commonsense and symbolic reasoning datasets. For example, the accuracy of T5 XXL on GSM8K improves from 8.11% to 21.99% when finetuned on PaLM-540B generated chains of thought.

Via

Access Paper or Ask Questions

Text Generation with Text-Editing Models

Jun 14, 2022

Eric Malmi, Yue Dong, Jonathan Mallinson, Aleksandr Chuklin, Jakub Adamek, Daniil Mirylenka, Felix Stahlberg, Sebastian Krause, Shankar Kumar, Aliaksei Severyn

Figure 1 for Text Generation with Text-Editing Models

Figure 2 for Text Generation with Text-Editing Models

Figure 3 for Text Generation with Text-Editing Models

Figure 4 for Text Generation with Text-Editing Models

Abstract:Text-editing models have recently become a prominent alternative to seq2seq models for monolingual text-generation tasks such as grammatical error correction, simplification, and style transfer. These tasks share a common trait - they exhibit a large amount of textual overlap between the source and target texts. Text-editing models take advantage of this observation and learn to generate the output by predicting edit operations applied to the source sequence. In contrast, seq2seq models generate outputs word-by-word from scratch thus making them slow at inference time. Text-editing models provide several benefits over seq2seq models including faster inference speed, higher sample efficiency, and better control and interpretability of the outputs. This tutorial provides a comprehensive overview of text-editing models and current state-of-the-art approaches, and analyzes their pros and cons. We discuss challenges related to productionization and how these models can be used to mitigate hallucination and bias, both pressing challenges in the field of text generation.

* Accepted as a tutorial at NAACL 2022

Via

Access Paper or Ask Questions

EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

May 24, 2022

Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn

Figure 1 for EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

Figure 2 for EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

Figure 3 for EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

Figure 4 for EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

Abstract:We present EdiT5 - a novel semi-autoregressive text-editing approach designed to combine the strengths of non-autoregressive text-editing and autoregressive decoding. EdiT5 is faster at inference times than conventional sequence-to-sequence (seq2seq) models, while being capable of modeling flexible input-output transformations. This is achieved by decomposing the generation process into three sub-tasks: (1) tagging to decide on the subset of input tokens to be preserved in the output, (2) re-ordering to define their order in the output text, and (3) insertion to infill the missing tokens that are not present in the input. The tagging and re-ordering steps, which are responsible for generating the largest portion of the output, are non-autoregressive, while the insertion uses an autoregressive decoder. Depending on the task, EdiT5 requires significantly fewer autoregressive steps demonstrating speedups of up to 25x when compared to classic seq2seq models. Quality-wise, EdiT5 is initialized with a pre-trained T5 checkpoint yielding comparable performance to T5 in high-resource settings and clearly outperforms it on low-resource settings when evaluated on three NLG tasks: Sentence Fusion, Grammatical Error Correction, and Decontextualization.

Via

Access Paper or Ask Questions

Controlled Text Generation as Continuous Optimization with Multiple Constraints

Aug 04, 2021

Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov

Figure 1 for Controlled Text Generation as Continuous Optimization with Multiple Constraints

Figure 2 for Controlled Text Generation as Continuous Optimization with Multiple Constraints

Figure 3 for Controlled Text Generation as Continuous Optimization with Multiple Constraints

Figure 4 for Controlled Text Generation as Continuous Optimization with Multiple Constraints

Abstract:As large-scale language model pretraining pushes the state-of-the-art in text generation, recent work has turned to controlling attributes of the text such models generate. While modifying the pretrained models via fine-tuning remains the popular approach, it incurs a significant computational cost and can be infeasible due to lack of appropriate data. As an alternative, we propose MuCoCO -- a flexible and modular algorithm for controllable inference from pretrained models. We formulate the decoding process as an optimization problem which allows for multiple attributes we aim to control to be easily incorporated as differentiable constraints to the optimization. By relaxing this discrete optimization to a continuous one, we make use of Lagrangian multipliers and gradient-descent based techniques to generate the desired text. We evaluate our approach on controllable machine translation and style transfer with multiple sentence-level attributes and observe significant improvements over baselines.

Via

Access Paper or Ask Questions

A Simple Recipe for Multilingual Grammatical Error Correction

Jun 07, 2021

Sascha Rothe, Jonathan Mallinson, Eric Malmi, Sebastian Krause, Aliaksei Severyn

Figure 1 for A Simple Recipe for Multilingual Grammatical Error Correction

Figure 2 for A Simple Recipe for Multilingual Grammatical Error Correction

Figure 3 for A Simple Recipe for Multilingual Grammatical Error Correction

Figure 4 for A Simple Recipe for Multilingual Grammatical Error Correction

Abstract:This paper presents a simple recipe to train state-of-the-art multilingual Grammatical Error Correction (GEC) models. We achieve this by first proposing a language-agnostic method to generate a large number of synthetic examples. The second ingredient is to use large-scale multilingual language models (up to 11B parameters). Once fine-tuned on language-specific supervised sets we surpass the previous state-of-the-art results on GEC benchmarks in four languages: English, Czech, German and Russian. Having established a new set of baselines for GEC, we make our results easily reproducible and accessible by releasing a cLang-8 dataset. It is produced by using our best model, which we call gT5, to clean the targets of a widely used yet noisy lang-8 dataset. cLang-8 greatly simplifies typical GEC training pipelines composed of multiple fine-tuning stages -- we demonstrate that performing a single fine-tuning step on cLang-8 with the off-the-shelf language models yields further accuracy improvements over an already top-performing gT5 model for English.

Via

Access Paper or Ask Questions

Semantically Driven Sentence Fusion: Modeling and Evaluation

Oct 06, 2020

Eyal Ben-David, Orgad Keller, Eric Malmi, Idan Szpektor, Roi Reichart

Figure 1 for Semantically Driven Sentence Fusion: Modeling and Evaluation

Figure 2 for Semantically Driven Sentence Fusion: Modeling and Evaluation

Figure 3 for Semantically Driven Sentence Fusion: Modeling and Evaluation

Figure 4 for Semantically Driven Sentence Fusion: Modeling and Evaluation

Abstract:Sentence fusion is the task of joining related sentences into coherent text. Current training and evaluation schemes for this task are based on single reference ground-truths and do not account for valid fusion variants. We show that this hinders models from robustly capturing the semantic relationship between input sentences. To alleviate this, we present an approach in which ground-truth solutions are automatically expanded into multiple references via curated equivalence classes of connective phrases. We apply this method to a large-scale dataset and use the augmented dataset for both model training and evaluation. To improve the learning of semantic representation using multiple references, we enrich the model with auxiliary discourse classification tasks under a multi-tasking framework. Our experiments highlight the improvements of our approach over state-of-the-art models.

* This paper was accepted to Findings of EMNLP 2020

Via

Access Paper or Ask Questions

Unsupervised Text Style Transfer with Padded Masked Language Models

Oct 02, 2020

Eric Malmi, Aliaksei Severyn, Sascha Rothe

Figure 1 for Unsupervised Text Style Transfer with Padded Masked Language Models

Figure 2 for Unsupervised Text Style Transfer with Padded Masked Language Models

Figure 3 for Unsupervised Text Style Transfer with Padded Masked Language Models

Figure 4 for Unsupervised Text Style Transfer with Padded Masked Language Models

Abstract:We propose Masker, an unsupervised text-editing method for style transfer. To tackle cases when no parallel source-target pairs are available, we train masked language models (MLMs) for both the source and the target domain. Then we find the text spans where the two models disagree the most in terms of likelihood. This allows us to identify the source tokens to delete to transform the source text to match the style of the target domain. The deleted tokens are replaced with the target MLM, and by using a padded MLM variant, we avoid having to predetermine the number of inserted tokens. Our experiments on sentence fusion and sentiment transfer demonstrate that Masker performs competitively in a fully unsupervised setting. Moreover, in low-resource settings, it improves supervised methods' accuracy by over 10 percentage points when pre-training them on silver training data generated by Masker.

* EMNLP 2020

Via

Access Paper or Ask Questions