Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models

Apr 18, 2024

Cheng Shi, Sibei Yang

Figure 1 for The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models

Figure 2 for The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models

Figure 3 for The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models

Figure 4 for The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models

Share this with someone who'll enjoy it:

Abstract:Foundation models, pre-trained on a large amount of data have demonstrated impressive zero-shot capabilities in various downstream tasks. However, in object detection and instance segmentation, two fundamental computer vision tasks heavily reliant on extensive human annotations, foundation models such as SAM and DINO struggle to achieve satisfactory performance. In this study, we reveal that the devil is in the object boundary, \textit{i.e.}, these foundation models fail to discern boundaries between individual objects. For the first time, we probe that CLIP, which has never accessed any instance-level annotations, can provide a highly beneficial and strong instance-level boundary prior in the clustering results of its particular intermediate layer. Following this surprising observation, we propose $\textbf{Zip}$ which $\textbf{Z}$ips up CL$\textbf{ip}$ and SAM in a novel classification-first-then-discovery pipeline, enabling annotation-free, complex-scene-capable, open-vocabulary object detection and instance segmentation. Our Zip significantly boosts SAM's mask AP on COCO dataset by 12.5% and establishes state-of-the-art performance in various settings, including training-free, self-training, and label-efficient finetuning. Furthermore, annotation-free Zip even achieves comparable performance to the best-performing open-vocabulary object detecters using base annotations. Code is released at https://github.com/ChengShiest/Zip-Your-CLIP

* ICLR2024, Code is released at https://github.com/ChengShiest/Zip-Your-CLIP

View paper on

Share this with someone who'll enjoy it:

Title:The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models

Paper and Code