Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

Jan 08, 2024

Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma

Figure 1 for BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

Figure 2 for BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

Figure 3 for BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

Figure 4 for BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

Share this with someone who'll enjoy it:

Abstract:In this paper, we address the challenge of image resolution variation for the Segment Anything Model (SAM). SAM, known for its zero-shot generalizability, exhibits a performance degradation when faced with datasets with varying image sizes. Previous approaches tend to resize the image to a fixed size or adopt structure modifications, hindering the preservation of SAM's rich prior knowledge. Besides, such task-specific tuning necessitates a complete retraining of the model, which is cost-expensive and unacceptable for deployment in the downstream tasks. In this paper, we reformulate this issue as a length extrapolation problem, where token sequence length varies while maintaining a consistent patch size for images of different sizes. To this end, we propose Scalable Bias-Mode Attention Mask (BA-SAM) to enhance SAM's adaptability to varying image resolutions while eliminating the need for structure modifications. Firstly, we introduce a new scaling factor to ensure consistent magnitude in the attention layer's dot product values when the token sequence length changes. Secondly, we present a bias-mode attention mask that allows each token to prioritize neighboring information, mitigating the impact of untrained distant information. Our BA-SAM demonstrates efficacy in two scenarios: zero-shot and fine-tuning. Extensive evaluation on diverse datasets, including DIS5K, DUTS, ISIC, COD10K, and COCO, reveals its ability to significantly mitigate performance degradation in the zero-shot setting and achieve state-of-the-art performance with minimal fine-tuning. Furthermore, we propose a generalized model and benchmark, showcasing BA-SAM's generalizability across all four datasets simultaneously.

* Code:https://github.com/zongzi13545329/BA-SAM

View paper on

Share this with someone who'll enjoy it:

Title:BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

Paper and Code