Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Understanding How Encoder-Decoder Architectures Attend

Oct 28, 2021

Kyle Aitken, Vinay V Ramasesh, Yuan Cao, Niru Maheswaranathan

Figure 1 for Understanding How Encoder-Decoder Architectures Attend

Figure 2 for Understanding How Encoder-Decoder Architectures Attend

Figure 3 for Understanding How Encoder-Decoder Architectures Attend

Figure 4 for Understanding How Encoder-Decoder Architectures Attend

Share this with someone who'll enjoy it:

Abstract:Encoder-decoder networks with attention have proven to be a powerful way to solve many sequence-to-sequence tasks. In these networks, attention aligns encoder and decoder states and is often used for visualizing network behavior. However, the mechanisms used by networks to generate appropriate attention matrices are still mysterious. Moreover, how these mechanisms vary depending on the particular architecture used for the encoder and decoder (recurrent, feed-forward, etc.) are also not well understood. In this work, we investigate how encoder-decoder networks solve different sequence-to-sequence tasks. We introduce a way of decomposing hidden states over a sequence into temporal (independent of input) and input-driven (independent of sequence position) components. This reveals how attention matrices are formed: depending on the task requirements, networks rely more heavily on either the temporal or input-driven components. These findings hold across both recurrent and feed-forward architectures despite their differences in forming the temporal components. Overall, our results provide new insight into the inner workings of attention-based encoder-decoder networks.

* 10+14 pages, 16 figures. NeurIPS 2021

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Understanding How Encoder-Decoder Architectures Attend

Paper and Code