Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Investigating the translation capabilities of Large Language Models trained on parallel data only

Jun 13, 2024

Javier García Gilabert, Carlos Escolano, Aleix Sant Savall, Francesca De Luca Fornaciari, Audrey Mash, Xixian Liao, Maite Melero

Figure 1 for Investigating the translation capabilities of Large Language Models trained on parallel data only

Figure 2 for Investigating the translation capabilities of Large Language Models trained on parallel data only

Figure 3 for Investigating the translation capabilities of Large Language Models trained on parallel data only

Figure 4 for Investigating the translation capabilities of Large Language Models trained on parallel data only

Share this with someone who'll enjoy it:

Abstract:In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce PLUME (Parallel Language Model), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparably to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones. Utilizing this set of models, we conduct a thorough investigation into the translation capabilities of LLMs, probing their performance, the impact of the different elements of the prompt, and their cross-lingual representation space.

* We release our code at: https://github.com/projecte-aina/Plume

View paper on

Share this with someone who'll enjoy it:

Title:Investigating the translation capabilities of Large Language Models trained on parallel data only

Paper and Code