Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention

Jul 18, 2024

Xiaoya Tang, Bodong Zhang, Beatrice S. Knudsen, Tolga Tasdizen

Figure 1 for DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention

Figure 2 for DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention

Figure 3 for DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention

Figure 4 for DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention

Share this with someone who'll enjoy it:

Abstract:We here propose a novel hierarchical transformer model that adeptly integrates the feature extraction capabilities of Convolutional Neural Networks (CNNs) with the advanced representational potential of Vision Transformers (ViTs). Addressing the lack of inductive biases and dependence on extensive training datasets in ViTs, our model employs a CNN backbone to generate hierarchical visual representations. These representations are then adapted for transformer input through an innovative patch tokenization. We also introduce a 'scale attention' mechanism that captures cross-scale dependencies, complementing patch attention to enhance spatial understanding and preserve global perception. Our approach significantly outperforms baseline models on small and medium-sized medical datasets, demonstrating its efficiency and generalizability. The components are designed as plug-and-play for different CNN architectures and can be adapted for multiple applications. The code is available at https://github.com/xiaoyatang/DuoFormer.git.

* 11 pages, 5 figures

View paper on

Share this with someone who'll enjoy it:

Title:DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention

Paper and Code