Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Exploring Transformer Extrapolation

Jul 19, 2023

Zhen Qin, Yiran Zhong, Hui Deng

Figure 1 for Exploring Transformer Extrapolation

Figure 2 for Exploring Transformer Extrapolation

Figure 3 for Exploring Transformer Extrapolation

Figure 4 for Exploring Transformer Extrapolation

Share this with someone who'll enjoy it:

Abstract:Length extrapolation has attracted considerable attention recently since it allows transformers to be tested on longer sequences than those used in training. Previous research has shown that this property can be attained by using carefully designed Relative Positional Encodings (RPEs). While these methods perform well on a variety of corpora, the conditions for length extrapolation have yet to be investigated. This paper attempts to determine what types of RPEs allow for length extrapolation through a thorough mathematical and empirical analysis. We discover that a transformer is certain to possess this property as long as the series that corresponds to the RPE's exponential converges. Two practices are derived from the conditions and examined in language modeling tasks on a variety of corpora. As a bonus from the conditions, we derive a new Theoretical Receptive Field (TRF) to measure the receptive field of RPEs without taking any training steps. Extensive experiments are conducted on the Wikitext-103, Books, Github, and WikiBook datasets to demonstrate the viability of our discovered conditions. We also compare TRF to Empirical Receptive Field (ERF) across different models, showing consistently matched trends on the aforementioned datasets. The code is available at https://github.com/OpenNLPLab/Rpe.

* Zhen Qin and Yiran Zhong contribute equally to this paper; Yiran Zhong is the corresponding author. The code is available at https://github.com/OpenNLPLab/Rpe

View paper on

Share this with someone who'll enjoy it:

Title:Exploring Transformer Extrapolation

Paper and Code