Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation

Nov 15, 2023

Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky

Figure 1 for Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation

Figure 2 for Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation

Figure 3 for Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation

Figure 4 for Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation

Share this with someone who'll enjoy it:

Abstract:An ideal length-extrapolatable Transformer language model can handle sequences longer than the training length without any fine-tuning. Such long-context utilization capability relies heavily on a flexible positional embedding design. Upon investigating the flexibility of existing large pre-trained Transformer language models, we find that the T5 family deserves a closer look, as its positional embeddings capture rich and flexible attention patterns. However, T5 suffers from the dispersed attention issue: the longer the input sequence, the flatter the attention distribution. To alleviate the issue, we propose two attention alignment strategies via temperature scaling. Our findings show improvement on the long-context utilization capability of T5 on language modeling, retrieval, multi-document question answering, and code completion tasks without any fine-tuning. This suggests that a flexible positional embedding design and attention alignment can go a long way toward Transformer length extrapolation.

View paper on

Share this with someone who'll enjoy it:

Title:Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation

Paper and Code