Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:InvSeg: Test-Time Prompt Inversion for Semantic Segmentation

Oct 15, 2024

Jiayi Lin, Jiabo Huang, Jian Hu, Shaogang Gong

Figure 1 for InvSeg: Test-Time Prompt Inversion for Semantic Segmentation

Figure 2 for InvSeg: Test-Time Prompt Inversion for Semantic Segmentation

Figure 3 for InvSeg: Test-Time Prompt Inversion for Semantic Segmentation

Figure 4 for InvSeg: Test-Time Prompt Inversion for Semantic Segmentation

Share this with someone who'll enjoy it:

Abstract:Visual-textual correlations in the attention maps derived from text-to-image diffusion models are proven beneficial to dense visual prediction tasks, e.g., semantic segmentation. However, a significant challenge arises due to the input distributional discrepancy between the context-rich sentences used for image generation and the isolated class names typically employed in semantic segmentation, hindering the diffusion models from capturing accurate visual-textual correlations. To solve this, we propose InvSeg, a test-time prompt inversion method that tackles open-vocabulary semantic segmentation by inverting image-specific visual context into text prompt embedding space, leveraging structure information derived from the diffusion model's reconstruction process to enrich text prompts so as to associate each class with a structure-consistent mask. Specifically, we introduce Contrastive Soft Clustering (CSC) to align derived masks with the image's structure information, softly selecting anchors for each class and calculating weighted distances to push inner-class pixels closer while separating inter-class pixels, thereby ensuring mask distinction and internal consistency. By incorporating sample-specific context, InvSeg learns context-rich text prompts in embedding space and achieves accurate semantic alignment across modalities. Experiments show that InvSeg achieves state-of-the-art performance on the PASCAL VOC and Context datasets. Project page: https://jylin8100.github.io/InvSegProject/.

View paper on

Share this with someone who'll enjoy it:

Title:InvSeg: Test-Time Prompt Inversion for Semantic Segmentation

Paper and Code