Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Weighted Transformer Network for Machine Translation

Nov 06, 2017

Karim Ahmed, Nitish Shirish Keskar, Richard Socher

Figure 1 for Weighted Transformer Network for Machine Translation

Figure 2 for Weighted Transformer Network for Machine Translation

Figure 3 for Weighted Transformer Network for Machine Translation

Figure 4 for Weighted Transformer Network for Machine Translation

Share this with someone who'll enjoy it:

Abstract:State-of-the-art results on neural machine translation often use attentional sequence-to-sequence models with some form of convolution or recursion. Vaswani et al. (2017) propose a new architecture that avoids recurrence and convolution completely. Instead, it uses only self-attention and feed-forward layers. While the proposed architecture achieves state-of-the-art results on several machine translation tasks, it requires a large number of parameters and training iterations to converge. We propose Weighted Transformer, a Transformer with modified attention layers, that not only outperforms the baseline network in BLEU score but also converges 15-40% faster. Specifically, we replace the multi-head attention by multiple self-attention branches that the model learns to combine during the training process. Our model improves the state-of-the-art performance by 0.5 BLEU points on the WMT 2014 English-to-German translation task and by 0.4 on the English-to-French translation task.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Weighted Transformer Network for Machine Translation

Paper and Code