Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Text and Click inputs for unambiguous open vocabulary instance segmentation

Nov 24, 2023

Nikolai Warner, Meera Hahn, Jonathan Huang, Irfan Essa, Vighnesh Birodkar

Figure 1 for Text and Click inputs for unambiguous open vocabulary instance segmentation

Figure 2 for Text and Click inputs for unambiguous open vocabulary instance segmentation

Figure 3 for Text and Click inputs for unambiguous open vocabulary instance segmentation

Figure 4 for Text and Click inputs for unambiguous open vocabulary instance segmentation

Share this with someone who'll enjoy it:

Abstract:Segmentation localizes objects in an image on a fine-grained per-pixel scale. Segmentation benefits by humans-in-the-loop to provide additional input of objects to segment using a combination of foreground or background clicks. Tasks include photoediting or novel dataset annotation, where human annotators leverage an existing segmentation model instead of drawing raw pixel level annotations. We propose a new segmentation process, Text + Click segmentation, where a model takes as input an image, a text phrase describing a class to segment, and a single foreground click specifying the instance to segment. Compared to previous approaches, we leverage open-vocabulary image-text models to support a wide-range of text prompts. Conditioning segmentations on text prompts improves the accuracy of segmentations on novel or unseen classes. We demonstrate that the combination of a single user-specified foreground click and a text prompt allows a model to better disambiguate overlapping or co-occurring semantic categories, such as "tie", "suit", and "person". We study these results across common segmentation datasets such as refCOCO, COCO, VOC, and OpenImages. Source code available here.

* 20 pages, 9 figures, 8 tables

View paper on

Share this with someone who'll enjoy it:

Title:Text and Click inputs for unambiguous open vocabulary instance segmentation

Paper and Code