Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Sep 15, 2024

Hui Shen, Zhongwei Wan, Xin Wang, Mi Zhang

Figure 1 for Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Figure 2 for Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Figure 3 for Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Figure 4 for Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Share this with someone who'll enjoy it:

Abstract:Mamba and Vision Mamba (Vim) models have shown their potential as an alternative to methods based on Transformer architecture. This work introduces Fast Mamba for Vision (Famba-V), a cross-layer token fusion technique to enhance the training efficiency of Vim models. The key idea of Famba-V is to identify and fuse similar tokens across different Vim layers based on a suit of cross-layer strategies instead of simply applying token fusion uniformly across all the layers that existing works propose. We evaluate the performance of Famba-V on CIFAR-100. Our results show that Famba-V is able to enhance the training efficiency of Vim models by reducing both training time and peak memory usage during training. Moreover, the proposed cross-layer strategies allow Famba-V to deliver superior accuracy-efficiency trade-offs. These results all together demonstrate Famba-V as a promising efficiency enhancement technique for Vim models.

* Camera ready version of ECCV 2024 The Fourth Workshop on Computational Aspects of Deep Learning

View paper on

Share this with someone who'll enjoy it:

Title:Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Paper and Code