Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:An Evolved Universal Transformer Memory

Oct 17, 2024

Edoardo Cetin, Qi Sun, Tianyu Zhao, Yujin Tang

Figure 1 for An Evolved Universal Transformer Memory

Figure 2 for An Evolved Universal Transformer Memory

Figure 3 for An Evolved Universal Transformer Memory

Figure 4 for An Evolved Universal Transformer Memory

Share this with someone who'll enjoy it:

Abstract:Prior methods propose to offset the escalating costs of modern foundation models by dropping specific parts of their contexts with hand-designed rules, while attempting to preserve their original performance. We overcome this trade-off with Neural Attention Memory Models (NAMMs), introducing a learned network for memory management that improves both the performance and efficiency of transformers. We evolve NAMMs atop pre-trained transformers to provide different latent contexts focusing on the most relevant information for individual layers and attention heads.NAMMs are universally applicable to any model using self-attention as they condition exclusively on the values in the produced attention matrices. Learning NAMMs on a small set of problems, we achieve substantial performance improvements across multiple long-context benchmarks while cutting the model's input contexts up to a fraction of the original sizes. We show the generality of our conditioning enables zero-shot transfer of NAMMs trained only on language to entirely new transformer architectures even across input modalities, with their benefits carrying over to vision and reinforcement learning.

* 29 pages, 14 figures. Preprint, under submission. Source code is available at https://github.com/SakanaAI/evo-memory

View paper on

Share this with someone who'll enjoy it:

Title:An Evolved Universal Transformer Memory

Paper and Code