Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations

May 23, 2023

Jonas Pfeiffer, Francesco Piccinno, Massimo Nicosia, Xinyi Wang, Machel Reid, Sebastian Ruder

Figure 1 for mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations

Figure 2 for mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations

Figure 3 for mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations

Figure 4 for mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations

Share this with someone who'll enjoy it:

Abstract:Multilingual sequence-to-sequence models perform poorly with increased language coverage and fail to consistently generate text in the correct target language in few-shot settings. To address these challenges, we propose mmT5, a modular multilingual sequence-to-sequence model. mmT5 utilizes language-specific modules during pre-training, which disentangle language-specific information from language-agnostic information. We identify representation drift during fine-tuning as a key limitation of modular generative models and develop strategies that enable effective zero-shot transfer. Our model outperforms mT5 at the same parameter sizes by a large margin on representative natural language understanding and generation tasks in 40+ languages. Compared to mT5, mmT5 raises the rate of generating text in the correct language under zero-shot settings from 7% to 99%, thereby greatly alleviating the source language hallucination problem.

View paper on

Share this with someone who'll enjoy it:

Title:mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations

Paper and Code