Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinghsu Liu

Lingua Custodia's participation at the WMT 2021 Machine Translation using Terminologies shared task

Nov 03, 2021

Melissa Ailem, Jinghsu Liu, Raheel Qader

Figure 1 for Lingua Custodia's participation at the WMT 2021 Machine Translation using Terminologies shared task

Figure 2 for Lingua Custodia's participation at the WMT 2021 Machine Translation using Terminologies shared task

Figure 3 for Lingua Custodia's participation at the WMT 2021 Machine Translation using Terminologies shared task

Figure 4 for Lingua Custodia's participation at the WMT 2021 Machine Translation using Terminologies shared task

Abstract:This paper describes Lingua Custodia's submission to the WMT21 shared task on machine translation using terminologies. We consider three directions, namely English to French, Russian, and Chinese. We rely on a Transformer-based architecture as a building block, and we explore a method which introduces two main changes to the standard procedure to handle terminologies. The first one consists in augmenting the training data in such a way as to encourage the model to learn a copy behavior when it encounters terminology constraint terms. The second change is constraint token masking, whose purpose is to ease copy behavior learning and to improve model generalization. Empirical results show that our method satisfies most terminology constraints while maintaining high translation quality.

Via

Access Paper or Ask Questions

Encouraging Neural Machine Translation to Satisfy Terminology Constraints

Jun 07, 2021

Melissa Ailem, Jinghsu Liu, Raheel Qader

Figure 1 for Encouraging Neural Machine Translation to Satisfy Terminology Constraints

Figure 2 for Encouraging Neural Machine Translation to Satisfy Terminology Constraints

Figure 3 for Encouraging Neural Machine Translation to Satisfy Terminology Constraints

Figure 4 for Encouraging Neural Machine Translation to Satisfy Terminology Constraints

Abstract:We present a new approach to encourage neural machine translation to satisfy lexical constraints. Our method acts at the training step and thereby avoiding the introduction of any extra computational overhead at inference step. The proposed method combines three main ingredients. The first one consists in augmenting the training data to specify the constraints. Intuitively, this encourages the model to learn a copy behavior when it encounters constraint terms. Compared to previous work, we use a simplified augmentation strategy without source factors. The second ingredient is constraint token masking, which makes it even easier for the model to learn the copy behavior and generalize better. The third one, is a modification of the standard cross entropy loss to bias the model towards assigning high probabilities to constraint words. Empirical results show that our method improves upon related baselines in terms of both BLEU score and the percentage of generated constraint terms.

Via

Access Paper or Ask Questions