Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tapas Kumar Dutta

SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models

Mar 18, 2025

Subhadeep Koley, Tapas Kumar Dutta, Aneeshan Sain, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Yi-Zhe Song

Abstract:While foundation models have revolutionised computer vision, their effectiveness for sketch understanding remains limited by the unique challenges of abstract, sparse visual inputs. Through systematic analysis, we uncover two fundamental limitations: Stable Diffusion (SD) struggles to extract meaningful features from abstract sketches (unlike its success with photos), and exhibits a pronounced frequency-domain bias that suppresses essential low-frequency components needed for sketch understanding. Rather than costly retraining, we address these limitations by strategically combining SD with CLIP, whose strong semantic understanding naturally compensates for SD's spatial-frequency biases. By dynamically injecting CLIP features into SD's denoising process and adaptively aggregating features across semantic levels, our method achieves state-of-the-art performance in sketch retrieval (+3.35%), recognition (+1.06%), segmentation (+29.42%), and correspondence learning (+21.22%), demonstrating the first truly universal sketch feature representation in the era of foundation models.

* Accepted in CVPR 2025. Project page available at https://subhadeepkoley.github.io/SketchFusion/

Via

Access Paper or Ask Questions

SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation

Dec 11, 2024

Tapas Kumar Dutta, Snehashis Majhi, Deepak Ranjan Nayak, Debesh Jha

Figure 1 for SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation

Figure 2 for SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation

Figure 3 for SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation

Figure 4 for SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation

Abstract:Polyp segmentation in colonoscopy is crucial for detecting colorectal cancer. However, it is challenging due to variations in the structure, color, and size of polyps, as well as the lack of clear boundaries with surrounding tissues. Traditional segmentation models based on Convolutional Neural Networks (CNNs) struggle to capture detailed patterns and global context, limiting their performance. Vision Transformer (ViT)-based models address some of these issues but have difficulties in capturing local context and lack strong zero-shot generalization. To this end, we propose the Mamba-guided Segment Anything Model (SAM-Mamba) for efficient polyp segmentation. Our approach introduces a Mamba-Prior module in the encoder to bridge the gap between the general pre-trained representation of SAM and polyp-relevant trivial clues. It injects salient cues of polyp images into the SAM image encoder as a domain prior while capturing global dependencies at various scales, leading to more accurate segmentation results. Extensive experiments on five benchmark datasets show that SAM-Mamba outperforms traditional CNN, ViT, and Adapter-based models in both quantitative and qualitative measures. Additionally, SAM-Mamba demonstrates excellent adaptability to unseen datasets, making it highly suitable for real-time clinical use.

Via

Access Paper or Ask Questions