Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pengke Chen

MDIA: A Benchmark for Multilingual Dialogue Generation in 46 Languages

Aug 27, 2022

Qingyu Zhang, Xiaoyu Shen, Ernie Chang, Jidong Ge, Pengke Chen

Figure 1 for MDIA: A Benchmark for Multilingual Dialogue Generation in 46 Languages

Figure 2 for MDIA: A Benchmark for Multilingual Dialogue Generation in 46 Languages

Figure 3 for MDIA: A Benchmark for Multilingual Dialogue Generation in 46 Languages

Figure 4 for MDIA: A Benchmark for Multilingual Dialogue Generation in 46 Languages

Abstract:Owing to the lack of corpora for low-resource languages, current works on dialogue generation have mainly focused on English. In this paper, we present mDIA, the first large-scale multilingual benchmark for dialogue generation across low- to high-resource languages. It covers real-life conversations in 46 languages across 19 language families. We present baseline results obtained by fine-tuning the multilingual, non-dialogue-focused pre-trained model mT5 as well as English-centric, dialogue-focused pre-trained chatbot DialoGPT. The results show that mT5-based models perform better on sacreBLEU and BertScore but worse on diversity. Even though promising results are found in few-shot and zero-shot scenarios, there is a large gap between the generation quality in English and other languages. We hope that the release of mDIA could encourage more works on multilingual dialogue generation to promote language diversity.

* The dataset and processing scripts are available in https://github.com/DoctorDream/mDIA

Via

Access Paper or Ask Questions