Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Cure the headache of Transformers via Collinear Constrained Attention

Sep 15, 2023

Shiyi Zhu, Jing Ye, Wei Jiang, Qi Zhang, Yifan Wu, Jianguo Li

Figure 1 for Cure the headache of Transformers via Collinear Constrained Attention

Figure 2 for Cure the headache of Transformers via Collinear Constrained Attention

Figure 3 for Cure the headache of Transformers via Collinear Constrained Attention

Figure 4 for Cure the headache of Transformers via Collinear Constrained Attention

Share this with someone who'll enjoy it:

Abstract:As the rapid progression of practical applications based on Large Language Models continues, the importance of extrapolating performance has grown exponentially in the research domain. In our study, we identified an anomalous behavior in Transformer models that had been previously overlooked, leading to a chaos around closest tokens which carried the most important information. We've coined this discovery the "headache of Transformers". To address this at its core, we introduced a novel self-attention structure named Collinear Constrained Attention (CoCA). This structure can be seamlessly integrated with existing extrapolation, interpolation methods, and other optimization strategies designed for traditional Transformer models. We have achieved excellent extrapolating performance even for 16 times to 24 times of sequence lengths during inference without any fine-tuning on our model. We have also enhanced CoCA's computational and spatial efficiency to ensure its practicality. We plan to open-source CoCA shortly. In the meantime, we've made our code available in the appendix for reappearing experiments.

* 16 pages, 6 figures

View paper on

Share this with someone who'll enjoy it:

Title:Cure the headache of Transformers via Collinear Constrained Attention

Paper and Code