Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

Mar 29, 2023

Vibashan VS, Ning Yu, Chen Xing, Can Qin, Mingfei Gao, Juan Carlos Niebles, Vishal M. Patel, Ran Xu

Figure 1 for Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

Figure 2 for Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

Figure 3 for Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

Figure 4 for Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

Share this with someone who'll enjoy it:

Abstract:Existing instance segmentation models learn task-specific information using manual mask annotations from base (training) categories. These mask annotations require tremendous human effort, limiting the scalability to annotate novel (new) categories. To alleviate this problem, Open-Vocabulary (OV) methods leverage large-scale image-caption pairs and vision-language models to learn novel categories. In summary, an OV method learns task-specific information using strong supervision from base annotations and novel category information using weak supervision from image-captions pairs. This difference between strong and weak supervision leads to overfitting on base categories, resulting in poor generalization towards novel categories. In this work, we overcome this issue by learning both base and novel categories from pseudo-mask annotations generated by the vision-language model in a weakly supervised manner using our proposed Mask-free OVIS pipeline. Our method automatically generates pseudo-mask annotations by leveraging the localization ability of a pre-trained vision-language model for objects present in image-caption pairs. The generated pseudo-mask annotations are then used to supervise an instance segmentation model, freeing the entire pipeline from any labour-expensive instance-level annotations and overfitting. Our extensive experiments show that our method trained with just pseudo-masks significantly improves the mAP scores on the MS-COCO dataset and OpenImages dataset compared to the recent state-of-the-art methods trained with manual masks. Codes and models are provided in https://vibashan.github.io/ovis-web/.

* Accepted to CVPR 2023. Project site: https://vibashan.github.io/ovis-web/

View paper on

Share this with someone who'll enjoy it:

Title:Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

Paper and Code