Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

Mar 08, 2020

Yu-Siang Wang, Yen-Ling Kuo, Boris Katz

Figure 1 for Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

Figure 2 for Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

Figure 3 for Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

Figure 4 for Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

Share this with someone who'll enjoy it:

Abstract:We demonstrate how we can practically incorporate multi-step future information into a decoder of maximum likelihood sequence models. We propose a "k-step look-ahead" module to consider the likelihood information of a rollout up to k steps. Unlike other approaches that need to train another value network to evaluate the rollouts, we can directly apply this look-ahead module to improve the decoding of any sequence model trained in a maximum likelihood framework. We evaluate our look-ahead module on three datasets of varying difficulties: IM2LATEX-100k OCR image to LaTeX, WMT16 multimodal machine translation, and WMT14 machine translation. Our look-ahead module improves the performance of the simpler datasets such as IM2LATEX-100k and WMT16 multimodal machine translation. However, the improvement of the more difficult dataset (e.g., containing longer sequences), WMT14 machine translation, becomes marginal. Our further investigation using the k-step look-ahead suggests that the more difficult tasks suffer from the overestimated EOS (end-of-sentence) probability. We argue that the overestimated EOS probability also causes the decreased performance of beam search when increasing its beam width. We tackle the EOS problem by integrating an auxiliary EOS loss into the training to estimate if the model should emit EOS or other words. Our experiments show that improving EOS estimation not only increases the performance of our proposed look-ahead module but also the robustness of the beam search.

* 7 pages, 5 figures

View paper on

Share this with someone who'll enjoy it:

Title:Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

Paper and Code