Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention

Nov 13, 2023

Ziwei He, Jian Yuan, Le Zhou, Jingwen Leng, Bo Jiang

Figure 1 for Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention

Figure 2 for Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention

Figure 3 for Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention

Figure 4 for Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention

Share this with someone who'll enjoy it:

Abstract:The quadratic complexity of self-attention in Transformers has hindered the processing of long text. To alleviate this problem, previous works have proposed to sparsify the attention matrix, taking advantage of the observation that crucial information about a token can be derived from its neighbors. These methods typically combine one or another form of local attention and global attention. Such combinations introduce abrupt changes in contextual granularity when going from local to global, which may be undesirable. We believe that a smoother transition could potentially enhance model's ability to capture long-context dependencies. In this study, we introduce Fovea Transformer, a long-context focused transformer that addresses the challenges of capturing global dependencies while maintaining computational efficiency. To achieve this, we construct a multi-scale tree from the input sequence, and use representations of context tokens with a progressively coarser granularity in the tree, as their distance to the query token increases. We evaluate our model on three long-context summarization tasks\footnote{Our code is publicly available at: \textit{https://github.com/ZiweiHe/Fovea-Transformer}}. It achieves state-of-the-art performance on two of them, and competitive results on the third with mixed improvement and setback of the evaluation metrics.

View paper on

Share this with someone who'll enjoy it:

Title:Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention

Paper and Code