Dilated Sliding Window Attention


Training Transformers for Mesh-Based Simulations

Add code
Aug 25, 2025
Figure 1 for Training Transformers for Mesh-Based Simulations
Figure 2 for Training Transformers for Mesh-Based Simulations
Figure 3 for Training Transformers for Mesh-Based Simulations
Figure 4 for Training Transformers for Mesh-Based Simulations
Viaarxiv icon

DiNAT-IR: Exploring Dilated Neighborhood Attention for High-Quality Image Restoration

Add code
Jul 23, 2025
Viaarxiv icon

Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level

Add code
Mar 07, 2024
Figure 1 for Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Figure 2 for Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Figure 3 for Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Figure 4 for Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Viaarxiv icon

Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation

Add code
May 05, 2023
Figure 1 for Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Figure 2 for Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Figure 3 for Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Figure 4 for Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Viaarxiv icon

DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition

Add code
Feb 03, 2023
Figure 1 for DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
Figure 2 for DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
Figure 3 for DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
Figure 4 for DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
Viaarxiv icon

Dilated Neighborhood Attention Transformer

Add code
Sep 29, 2022
Figure 1 for Dilated Neighborhood Attention Transformer
Figure 2 for Dilated Neighborhood Attention Transformer
Figure 3 for Dilated Neighborhood Attention Transformer
Figure 4 for Dilated Neighborhood Attention Transformer
Viaarxiv icon

BumbleBee: A Transformer for Music

Add code
Jul 07, 2021
Figure 1 for BumbleBee: A Transformer for Music
Figure 2 for BumbleBee: A Transformer for Music
Figure 3 for BumbleBee: A Transformer for Music
Figure 4 for BumbleBee: A Transformer for Music
Viaarxiv icon