Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Álvaro Rodríguez Abella

The Asymptotic Behavior of Attention in Transformers

Dec 03, 2024

Álvaro Rodríguez Abella, João Pedro Silvestre, Paulo Tabuada

Figure 1 for The Asymptotic Behavior of Attention in Transformers

Figure 2 for The Asymptotic Behavior of Attention in Transformers

Figure 3 for The Asymptotic Behavior of Attention in Transformers

Figure 4 for The Asymptotic Behavior of Attention in Transformers

Abstract:A key component of transformers is the attention mechanism orchestrating how each token influences the propagation of every other token through a transformer. In this paper we provide a rigorous, mathematical analysis of the asymptotic properties of attention in transformers. Although we present several results based on different assumptions, all of them point to the same conclusion, all tokens asymptotically converge to each other, a phenomenon that has been empirically reported in the literature. Our findings are carefully compared with existing theoretical results and illustrated by simulations and experimental studies using the GPT-2 model.

Via

Access Paper or Ask Questions