Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Don't Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation Data

Jun 12, 2021

Rajat Bhatnagar, Ananya Ganesh, Katharina Kann

Figure 1 for Don't Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation Data

Figure 2 for Don't Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation Data

Figure 3 for Don't Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation Data

Figure 4 for Don't Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation Data

Share this with someone who'll enjoy it:

Abstract:High-performing machine translation (MT) systems can help overcome language barriers while making it possible for everyone to communicate and use language technologies in the language of their choice. However, such systems require large amounts of parallel sentences for training, and translators can be difficult to find and expensive. Here, we present a data collection strategy for MT which, in contrast, is cheap and simple, as it does not require bilingual speakers. Based on the insight that humans pay specific attention to movements, we use graphics interchange formats (GIFs) as a pivot to collect parallel sentences from monolingual annotators. We use our strategy to collect data in Hindi, Tamil and English. As a baseline, we also collect data using images as a pivot. We perform an intrinsic evaluation by manually evaluating a subset of the sentence pairs and an extrinsic evaluation by finetuning mBART on the collected data. We find that sentences collected via GIFs are indeed of higher quality.

* 5 pages, 1 figure, ACL-IJCNLP 2021 submission, Natural Language Processing, Data Collection, Monolingual Speakers, Machine Translation, GIFs, Images

View paper on

Share this with someone who'll enjoy it:

Title:Don't Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation Data

Paper and Code