Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Smoothing and Shrinking the Sparse Seq2Seq Search Space

Mar 18, 2021

Ben Peters, André F. T. Martins

Figure 1 for Smoothing and Shrinking the Sparse Seq2Seq Search Space

Figure 2 for Smoothing and Shrinking the Sparse Seq2Seq Search Space

Figure 3 for Smoothing and Shrinking the Sparse Seq2Seq Search Space

Figure 4 for Smoothing and Shrinking the Sparse Seq2Seq Search Space

Share this with someone who'll enjoy it:

Abstract:Current sequence-to-sequence models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over target sequences. While this setup has led to strong results in a variety of tasks, one unsatisfying aspect is its length bias: models give high scores to short, inadequate hypotheses and often make the empty string the argmax -- the so-called cat got your tongue problem. Recently proposed entmax-based sparse sequence-to-sequence models present a possible solution, since they can shrink the search space by assigning zero probability to bad hypotheses, but their ability to handle word-level tasks with transformers has never been tested. In this work, we show that entmax-based models effectively solve the cat got your tongue problem, removing a major source of model error for neural machine translation. In addition, we generalize label smoothing, a critical regularization technique, to the broader family of Fenchel-Young losses, which includes both cross-entropy and the entmax losses. Our resulting label-smoothed entmax loss models set a new state of the art on multilingual grapheme-to-phoneme conversion and deliver improvements and better calibration properties on cross-lingual morphological inflection and machine translation for 6 language pairs.

* NAACL 2021

View paper on

Share this with someone who'll enjoy it:

Title:Smoothing and Shrinking the Sparse Seq2Seq Search Space

Paper and Code