Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On The Adaptation of Unlimiformer for Decoder-Only Transformers

Oct 02, 2024

Kian Ahrabian, Alon Benhaim, Barun Patra, Jay Pujara, Saksham Singhal, Xia Song

Figure 1 for On The Adaptation of Unlimiformer for Decoder-Only Transformers

Figure 2 for On The Adaptation of Unlimiformer for Decoder-Only Transformers

Figure 3 for On The Adaptation of Unlimiformer for Decoder-Only Transformers

Figure 4 for On The Adaptation of Unlimiformer for Decoder-Only Transformers

Share this with someone who'll enjoy it:

Abstract:One of the prominent issues stifling the current generation of large language models is their limited context length. Recent proprietary models such as GPT-4 and Claude 2 have introduced longer context lengths, 8k/32k and 100k, respectively; however, despite the efforts in the community, most common models, such as LLama-2, have a context length of 4k or less. Unlimiformer (Bertsch et al., 2023) is a recently popular vector-retrieval augmentation method that offloads cross-attention computations to a kNN index. However, its main limitation is incompatibility with decoder-only transformers out of the box. In this work, we explore practical considerations of adapting Unlimiformer to decoder-only transformers and introduce a series of modifications to overcome this limitation. Moreover, we expand the original experimental setup on summarization to include a new task (i.e., free-form Q&A) and an instruction-tuned model (i.e., a custom 6.7B GPT model). Our results showcase the effectiveness of these modifications on summarization, performing on par with a model with 2x the context length. Moreover, we discuss limitations and future directions for free-form Q&A and instruction-tuned models.

* 8 pages, 6 figures

View paper on

Share this with someone who'll enjoy it:

Title:On The Adaptation of Unlimiformer for Decoder-Only Transformers

Paper and Code