Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation

May 28, 2022

Chenze Shao, Xuanfu Wu, Yang Feng

Figure 1 for One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation

Figure 2 for One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation

Figure 3 for One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation

Figure 4 for One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation

Share this with someone who'll enjoy it:

Abstract:Non-autoregressive neural machine translation (NAT) suffers from the multi-modality problem: the source sentence may have multiple correct translations, but the loss function is calculated only according to the reference sentence. Sequence-level knowledge distillation makes the target more deterministic by replacing the target with the output from an autoregressive model. However, the multi-modality problem in the distilled dataset is still nonnegligible. Furthermore, learning from a specific teacher limits the upper bound of the model capability, restricting the potential of NAT models. In this paper, we argue that one reference is not enough and propose diverse distillation with reference selection (DDRS) for NAT. Specifically, we first propose a method called SeedDiv for diverse machine translation, which enables us to generate a dataset containing multiple high-quality reference translations for each source sentence. During the training, we compare the NAT output with all references and select the one that best fits the NAT output to train the model. Experiments on widely-used machine translation benchmarks demonstrate the effectiveness of DDRS, which achieves 29.82 BLEU with only one decoding pass on WMT14 En-De, improving the state-of-the-art performance for NAT by over 1 BLEU. Source code: https://github.com/ictnlp/DDRS-NAT

* NAACL 2022 main conference

View paper on

Share this with someone who'll enjoy it:

Title:One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation

Paper and Code