Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

Jun 27, 2024

Haobo Yuan, Xiangtai Li, Lu Qi, Tao Zhang, Ming-Hsuan Yang, Shuicheng Yan, Chen Change Loy

Figure 1 for Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

Figure 2 for Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

Figure 3 for Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

Figure 4 for Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

Share this with someone who'll enjoy it:

Abstract:Transformer-based segmentation methods face the challenge of efficient inference when dealing with high-resolution images. Recently, several linear attention architectures, such as Mamba and RWKV, have attracted much attention as they can process long sequences efficiently. In this work, we focus on designing an efficient segment-anything model by exploring these different architectures. Specifically, we design a mixed backbone that contains convolution and RWKV operation, which achieves the best for both accuracy and efficiency. In addition, we design an efficient decoder to utilize the multiscale tokens to obtain high-quality masks. We denote our method as RWKV-SAM, a simple, effective, fast baseline for SAM-like models. Moreover, we build a benchmark containing various high-quality segmentation datasets and jointly train one efficient yet high-quality segmentation model using this benchmark. Based on the benchmark results, our RWKV-SAM achieves outstanding performance in efficiency and segmentation quality compared to transformers and other linear attention models. For example, compared with the same-scale transformer model, RWKV-SAM achieves more than 2x speedup and can achieve better segmentation performance on various datasets. In addition, RWKV-SAM outperforms recent vision Mamba models with better classification and semantic segmentation results. Code and models will be publicly available.

* 16 pages; 8 figures

View paper on

Share this with someone who'll enjoy it:

Title:Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

Paper and Code