Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Synthetic Data for Neural Machine Translation of Spoken-Dialects

Nov 28, 2017

Hany Hassan, Mostafa Elaraby, Ahmed Tawfik

Figure 1 for Synthetic Data for Neural Machine Translation of Spoken-Dialects

Figure 2 for Synthetic Data for Neural Machine Translation of Spoken-Dialects

Figure 3 for Synthetic Data for Neural Machine Translation of Spoken-Dialects

Figure 4 for Synthetic Data for Neural Machine Translation of Spoken-Dialects

Share this with someone who'll enjoy it:

Abstract:In this paper, we introduce a novel approach to generate synthetic data for training Neural Machine Translation systems. The proposed approach transforms a given parallel corpus between a written language and a target language to a parallel corpus between a spoken dialect variant and the target language. Our approach is language independent and can be used to generate data for any variant of the source language such as slang or spoken dialect or even for a different language that is closely related to the source language. The proposed approach is based on local embedding projection of distributed representations which utilizes monolingual embeddings to transform parallel data across language variants. We report experimental results on Levantine to English translation using Neural Machine Translation. We show that the generated data can improve a very large scale system by more than 2.8 Bleu points using synthetic spoken data which shows that it can be used to provide a reliable translation system for a spoken dialect that does not have sufficient parallel data.

View paper on

Share this with someone who'll enjoy it:

Title:Synthetic Data for Neural Machine Translation of Spoken-Dialects

Paper and Code