Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Loic Barrault

NLLB Team

No Language Left Behind: Scaling Human-Centered Machine Translation

Jul 11, 2022

NLLB team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht(+29 more)

Figure 1 for No Language Left Behind: Scaling Human-Centered Machine Translation

Figure 2 for No Language Left Behind: Scaling Human-Centered Machine Translation

Figure 3 for No Language Left Behind: Scaling Human-Centered Machine Translation

Figure 4 for No Language Left Behind: Scaling Human-Centered Machine Translation

Abstract:Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality results, all while keeping ethical considerations in mind? In No Language Left Behind, we took on this challenge by first contextualizing the need for low-resource language translation support through exploratory interviews with native speakers. Then, we created datasets and models aimed at narrowing the performance gap between low and high-resource languages. More specifically, we developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages. We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. Critically, we evaluated the performance of over 40,000 different translation directions using a human-translated benchmark, Flores-200, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system. Finally, we open source all contributions described in this work, accessible at https://github.com/facebookresearch/fairseq/tree/nllb.

* 190 pages

Via

Access Paper or Ask Questions

Bayesian Active Learning with Pretrained Language Models

Apr 16, 2021

Katerina Margatina, Loic Barrault, Nikolaos Aletras

Figure 1 for Bayesian Active Learning with Pretrained Language Models

Figure 2 for Bayesian Active Learning with Pretrained Language Models

Figure 3 for Bayesian Active Learning with Pretrained Language Models

Figure 4 for Bayesian Active Learning with Pretrained Language Models

Abstract:Active Learning (AL) is a method to iteratively select data for annotation from a pool of unlabeled data, aiming to achieve better model performance than random selection. Previous AL approaches in Natural Language Processing (NLP) have been limited to either task-specific models that are trained from scratch at each iteration using only the labeled data at hand or using off-the-shelf pretrained language models (LMs) that are not adapted effectively to the downstream task. In this paper, we address these limitations by introducing BALM; Bayesian Active Learning with pretrained language Models. We first propose to adapt the pretrained LM to the downstream task by continuing training with all the available unlabeled data and then use it for AL. We also suggest a simple yet effective fine-tuning method to ensure that the adapted LM is properly trained in both low and high resource scenarios during AL. We finally apply Monte Carlo dropout to the downstream model to obtain well-calibrated confidence scores for data selection with uncertainty sampling. Our experiments in five standard natural language understanding tasks demonstrate that BALM provides substantial data efficiency improvements compared to various combinations of acquisition functions, models and fine-tuning methods proposed in recent AL literature.

Via

Access Paper or Ask Questions

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Jul 08, 2018

Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, Antoine Bordes

Figure 1 for Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Figure 2 for Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Figure 3 for Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Figure 4 for Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Abstract:Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features. Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so successful. Several attempts at learning unsupervised representations of sentences have not reached satisfactory enough performance to be widely adopted. In this paper, we show how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks. Much like how computer vision uses ImageNet to obtain features, which can then be transferred to other tasks, our work tends to indicate the suitability of natural language inference for transfer learning to other NLP tasks. Our encoder is publicly available.

* EMNLP 2017

Via

Access Paper or Ask Questions

Incremental Adaptation Strategies for Neural Network Language Models

Jul 07, 2015

Aram Ter-Sarkisov, Holger Schwenk, Loic Barrault, Fethi Bougares

Figure 1 for Incremental Adaptation Strategies for Neural Network Language Models

Figure 2 for Incremental Adaptation Strategies for Neural Network Language Models

Figure 3 for Incremental Adaptation Strategies for Neural Network Language Models

Figure 4 for Incremental Adaptation Strategies for Neural Network Language Models

Abstract:It is today acknowledged that neural network language models outperform backoff language models in applications like speech recognition or statistical machine translation. However, training these models on large amounts of data can take several days. We present efficient techniques to adapt a neural network language model to new data. Instead of training a completely new model or relying on mixture approaches, we propose two new methods: continued training on resampled data or insertion of adaptation layers. We present experimental results in an CAT environment where the post-edits of professional translators are used to improve an SMT system. Both methods are very fast and achieve significant improvements without overfitting the small adaptation data.

* accepted as workshop paper at ACL-IJCNLP 2015

Via

Access Paper or Ask Questions

On Using Monolingual Corpora in Neural Machine Translation

Jun 12, 2015

Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio

Figure 1 for On Using Monolingual Corpora in Neural Machine Translation

Figure 2 for On Using Monolingual Corpora in Neural Machine Translation

Figure 3 for On Using Monolingual Corpora in Neural Machine Translation

Figure 4 for On Using Monolingual Corpora in Neural Machine Translation

Abstract:Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation. Arguably, one of the major factors behind this success has been the availability of high quality parallel corpora. In this work, we investigate how to leverage abundant monolingual corpora for neural machine translation. Compared to a phrase-based and hierarchical baseline, we obtain up to $1.96$ BLEU improvement on the low-resource language pair Turkish-English, and $1.59$ BLEU on the focused domain task of Chinese-English chat messages. While our method was initially targeted toward such tasks with less parallel data, we show that it also extends to high resource languages such as Cs-En and De-En where we obtain an improvement of $0.39$ and $0.47$ BLEU scores over the neural machine translation baselines, respectively.

* 9 pages, 2 figures

Via

Access Paper or Ask Questions