Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Read, look and detect: Bounding box annotation from image-caption pairs

Jun 09, 2023

Eduardo Hugo Sanchez

Figure 1 for Read, look and detect: Bounding box annotation from image-caption pairs

Figure 2 for Read, look and detect: Bounding box annotation from image-caption pairs

Figure 3 for Read, look and detect: Bounding box annotation from image-caption pairs

Figure 4 for Read, look and detect: Bounding box annotation from image-caption pairs

Share this with someone who'll enjoy it:

Abstract:Various methods have been proposed to detect objects while reducing the cost of data annotation. For instance, weakly supervised object detection (WSOD) methods rely only on image-level annotations during training. Unfortunately, data annotation remains expensive since annotators must provide the categories describing the content of each image and labeling is restricted to a fixed set of categories. In this paper, we propose a method to locate and label objects in an image by using a form of weaker supervision: image-caption pairs. By leveraging recent advances in vision-language (VL) models and self-supervised vision transformers (ViTs), our method is able to perform phrase grounding and object detection in a weakly supervised manner. Our experiments demonstrate the effectiveness of our approach by achieving a 47.51% recall@1 score in phrase grounding on Flickr30k Entities and establishing a new state-of-the-art in object detection by achieving 21.1 mAP 50 and 10.5 mAP 50:95 on MS COCO when exclusively relying on image-caption pairs.

View paper on

Share this with someone who'll enjoy it:

Title:Read, look and detect: Bounding box annotation from image-caption pairs

Paper and Code