Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Document-aligned Japanese-English Conversation Parallel Corpus

Dec 11, 2020

Matīss Rikters, Ryokan Ri, Tong Li, Toshiaki Nakazawa

Figure 1 for Document-aligned Japanese-English Conversation Parallel Corpus

Figure 2 for Document-aligned Japanese-English Conversation Parallel Corpus

Figure 3 for Document-aligned Japanese-English Conversation Parallel Corpus

Figure 4 for Document-aligned Japanese-English Conversation Parallel Corpus

Share this with someone who'll enjoy it:

Abstract:Sentence-level (SL) machine translation (MT) has reached acceptable quality for many high-resourced languages, but not document-level (DL) MT, which is difficult to 1) train with little amount of DL data; and 2) evaluate, as the main methods and data sets focus on SL evaluation. To address the first issue, we present a document-aligned Japanese-English conversation corpus, including balanced, high-quality business conversation data for tuning and testing. As for the second issue, we manually identify the main areas where SL MT fails to produce adequate translations in lack of context. We then create an evaluation set where these phenomena are annotated to alleviate automatic evaluation of DL systems. We train MT models using our corpus to demonstrate how using context leads to improvements.

* Proceedings of the Fifth Conference on Machine Translation (2020), pages 637-643 * Published in proceedings of the Fifth Conference on Machine Translation, 2020

View paper on

Share this with someone who'll enjoy it:

Title:Document-aligned Japanese-English Conversation Parallel Corpus

Paper and Code