Dilated Sliding Window Attention


Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level

Add code
Mar 07, 2024
Figure 1 for Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Figure 2 for Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Figure 3 for Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Figure 4 for Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Viaarxiv icon

Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation

Add code
May 05, 2023
Figure 1 for Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Figure 2 for Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Figure 3 for Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Figure 4 for Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Viaarxiv icon

DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition

Add code
Feb 03, 2023
Viaarxiv icon

Dilated Neighborhood Attention Transformer

Add code
Sep 29, 2022
Figure 1 for Dilated Neighborhood Attention Transformer
Figure 2 for Dilated Neighborhood Attention Transformer
Figure 3 for Dilated Neighborhood Attention Transformer
Figure 4 for Dilated Neighborhood Attention Transformer
Viaarxiv icon

BumbleBee: A Transformer for Music

Add code
Jul 07, 2021
Figure 1 for BumbleBee: A Transformer for Music
Figure 2 for BumbleBee: A Transformer for Music
Figure 3 for BumbleBee: A Transformer for Music
Figure 4 for BumbleBee: A Transformer for Music
Viaarxiv icon