Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model

Jul 18, 2024

Abdelrahman Shaker, Syed Talal Wasim, Salman Khan, Juergen Gall, Fahad Shahbaz Khan

Figure 1 for GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model

Figure 2 for GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model

Figure 3 for GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model

Figure 4 for GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model

Share this with someone who'll enjoy it:

Abstract:Recent advancements in state-space models (SSMs) have showcased effective performance in modeling long-range dependencies with subquadratic complexity. However, pure SSM-based models still face challenges related to stability and achieving optimal performance on computer vision tasks. Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes. To address this, we introduce a Modulated Group Mamba layer which divides the input channels into four groups and applies our proposed SSM-based efficient Visual Single Selective Scanning (VSSS) block independently to each group, with each VSSS block scanning in one of the four spatial directions. The Modulated Group Mamba layer also wraps the four VSSS blocks into a channel modulation operator to improve cross-channel communication. Furthermore, we introduce a distillation-based training objective to stabilize the training of large models, leading to consistent performance gains. Our comprehensive experiments demonstrate the merits of the proposed contributions, leading to superior performance over existing methods for image classification on ImageNet-1K, object detection, instance segmentation on MS-COCO, and semantic segmentation on ADE20K. Our tiny variant with 23M parameters achieves state-of-the-art performance with a classification top-1 accuracy of 83.3% on ImageNet-1K, while being 26% efficient in terms of parameters, compared to the best existing Mamba design of same model size. Our code and models are available at: https://github.com/Amshaker/GroupMamba.

* Preprint. Our code and models are available at: https://github.com/Amshaker/GroupMamba

View paper on

Share this with someone who'll enjoy it:

Title:GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model

Paper and Code