Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

I-An Huang

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Apr 17, 2017

Yuting Zhang, Luyao Yuan, Yijie Guo, Zhiyuan He, I-An Huang, Honglak Lee

Figure 1 for Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Figure 2 for Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Figure 3 for Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Figure 4 for Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Abstract:Associating image regions with text queries has been recently explored as a new way to bridge visual and linguistic representations. A few pioneering approaches have been proposed based on recurrent neural language models trained generatively (e.g., generating captions), but achieving somewhat limited localization accuracy. To better address natural-language-based visual entity localization, we propose a discriminative approach. We formulate a discriminative bimodal neural network (DBNet), which can be trained by a classifier with extensive use of negative samples. Our training objective encourages better localization on single images, incorporates text phrases in a broad range, and properly pairs image regions with text phrases into positive and negative examples. Experiments on the Visual Genome dataset demonstrate the proposed DBNet significantly outperforms previous state-of-the-art methods both for localization on single images and for detection on multiple images. We we also establish an evaluation protocol for natural-language visual detection.

* IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

Via

Access Paper or Ask Questions