Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mark Fišel

Extremely low-resource machine translation for closely related languages

May 27, 2021

Maali Tars, Andre Tättar, Mark Fišel

Figure 1 for Extremely low-resource machine translation for closely related languages

Figure 2 for Extremely low-resource machine translation for closely related languages

Figure 3 for Extremely low-resource machine translation for closely related languages

Figure 4 for Extremely low-resource machine translation for closely related languages

Abstract:An effective method to improve extremely low-resource neural machine translation is multilingual training, which can be improved by leveraging monolingual data to create synthetic bilingual corpora using the back-translation method. This work focuses on closely related languages from the Uralic language family: from Estonian and Finnish geographical regions. We find that multilingual learning and synthetic corpora increase the translation quality in every language pair for which we have data. We show that transfer learning and fine-tuning are very effective for doing low-resource machine translation and achieve the best results. We collected new parallel data for V\~oro, North and South Saami and present first results of neural machine translation for these languages.

* Accepted at Nodalida'2021

Via

Access Paper or Ask Questions

Neural Speech Synthesis for Estonian

Oct 06, 2020

Liisa Rätsep, Liisi Piits, Hille Pajupuu, Indrek Hein, Mark Fišel

Abstract:This technical report describes the results of a collaboration between the NLP research group at the University of Tartu and the Institute of Estonian Language on improving neural speech synthesis for Estonian. The report (written in Estonian) describes the project results, the summary of which is: (1) Speech synthesis data from 6 speakers for a total of 92.4 hours is collected and openly released (CC-BY-4.0). Data available at https://konekorpus.tartunlp.ai and https://www.eki.ee/litsents/. (2) software and models for neural speech synthesis is released open-source (MIT license). Available at https://koodivaramu.eesti.ee/tartunlp/text-to-speech . (3) We ran evaluations of the new models and compared them to other existing solutions (HMM-based HTS models from EKI, http://www.eki.ee/heli/, and Google's speech synthesis for Estonian, accessed via https://translate.google.com). Evaluation includes voice acceptability MOS scores for sentence-level and longer excerpts, detailed error analysis and evaluation of the pre-processing module.

* 9 pages in Estonian

Via

Access Paper or Ask Questions