Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models

Oct 07, 2024

Rabin Adhikari, Safal Thapaliya, Manish Dhakal, Bishesh Khanal

Figure 1 for TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models

Figure 2 for TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models

Figure 3 for TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models

Figure 4 for TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models

Share this with someone who'll enjoy it:

Abstract:Vision-Language Models (VLMs) have shown impressive performance in vision tasks, but adapting them to new domains often requires expensive fine-tuning. Prompt tuning techniques, including textual, visual, and multimodal prompting, offer efficient alternatives by leveraging learnable prompts. However, their application to Vision-Language Segmentation Models (VLSMs) and evaluation under significant domain shifts remain unexplored. This work presents an open-source benchmarking framework, TuneVLSeg, to integrate various unimodal and multimodal prompt tuning techniques into VLSMs, making prompt tuning usable for downstream segmentation datasets with any number of classes. TuneVLSeg includes $6$ prompt tuning strategies on various prompt depths used in $2$ VLSMs totaling of $8$ different combinations. We test various prompt tuning on $8$ diverse medical datasets, including $3$ radiology datasets (breast tumor, echocardiograph, chest X-ray pathologies) and $5$ non-radiology datasets (polyp, ulcer, skin cancer), and two natural domain segmentation datasets. Our study found that textual prompt tuning struggles under significant domain shifts, from natural-domain images to medical data. Furthermore, visual prompt tuning, with fewer hyperparameters than multimodal prompt tuning, often achieves performance competitive to multimodal approaches, making it a valuable first attempt. Our work advances the understanding and applicability of different prompt-tuning techniques for robust domain-specific segmentation. The source code is available at https://github.com/naamiinepal/tunevlseg.

* Accepted at ACCV 2024 (oral presentation)

View paper on

Share this with someone who'll enjoy it:

Title:TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models

Paper and Code