Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards Training-free Open-world Segmentation via Image Prompting Foundation Models

Oct 17, 2023

Lv Tang, Peng-Tao Jiang, Hao-Ke Xiao, Bo Li

Figure 1 for Towards Training-free Open-world Segmentation via Image Prompting Foundation Models

Figure 2 for Towards Training-free Open-world Segmentation via Image Prompting Foundation Models

Figure 3 for Towards Training-free Open-world Segmentation via Image Prompting Foundation Models

Figure 4 for Towards Training-free Open-world Segmentation via Image Prompting Foundation Models

Share this with someone who'll enjoy it:

Abstract:The realm of computer vision has witnessed a paradigm shift with the advent of foundational models, mirroring the transformative influence of large language models in the domain of natural language processing. This paper delves into the exploration of open-world segmentation, presenting a novel approach called Image Prompt Segmentation (IPSeg) that harnesses the power of vision foundational models. At the heart of IPSeg lies the principle of a training-free paradigm, which capitalizes on image prompting techniques. IPSeg utilizes a single image containing a subjective visual concept as a flexible prompt to query vision foundation models like DINOv2 and Stable Diffusion. Our approach extracts robust features for the prompt image and input image, then matches the input representations to the prompt representations via a novel feature interaction module to generate point prompts highlighting target objects in the input image. The generated point prompts are further utilized to guide the Segment Anything Model to segment the target object in the input image. The proposed method stands out by eliminating the need for exhaustive training sessions, thereby offering a more efficient and scalable solution. Experiments on COCO, PASCAL VOC, and other datasets demonstrate IPSeg's efficacy for flexible open-world segmentation using intuitive image prompts. This work pioneers tapping foundation models for open-world understanding through visual concepts conveyed in images.

View paper on

Share this with someone who'll enjoy it:

Title:Towards Training-free Open-world Segmentation via Image Prompting Foundation Models

Paper and Code