Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Automatic Classification of Human Translation and Machine Translation: A Study from the Perspective of Lexical Diversity

May 10, 2021

Yingxue Fu, Mark-Jan Nederhof

Figure 1 for Automatic Classification of Human Translation and Machine Translation: A Study from the Perspective of Lexical Diversity

Figure 2 for Automatic Classification of Human Translation and Machine Translation: A Study from the Perspective of Lexical Diversity

Figure 3 for Automatic Classification of Human Translation and Machine Translation: A Study from the Perspective of Lexical Diversity

Figure 4 for Automatic Classification of Human Translation and Machine Translation: A Study from the Perspective of Lexical Diversity

Share this with someone who'll enjoy it:

Abstract:By using a trigram model and fine-tuning a pretrained BERT model for sequence classification, we show that machine translation and human translation can be classified with an accuracy above chance level, which suggests that machine translation and human translation are different in a systematic way. The classification accuracy of machine translation is much higher than of human translation. We show that this may be explained by the difference in lexical diversity between machine translation and human translation. If machine translation has independent patterns from human translation, automatic metrics which measure the deviation of machine translation from human translation may conflate difference with quality. Our experiment with two different types of automatic metrics shows correlation with the result of the classification task. Therefore, we suggest the difference in lexical diversity between machine translation and human translation be given more attention in machine translation evaluation.

* accepted by MoTra21, Nodalida 2021

View paper on

Share this with someone who'll enjoy it:

Title:Automatic Classification of Human Translation and Machine Translation: A Study from the Perspective of Lexical Diversity

Paper and Code