Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Aug 23, 2018

Niluthpol Chowdhury Mithun, Rameswar Panda, Evangelos E. Papalexakis, Amit K. Roy-Chowdhury

Figure 1 for Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Figure 2 for Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Figure 3 for Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Figure 4 for Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Share this with someone who'll enjoy it:

Abstract:Cross-modal retrieval between visual data and natural language description remains a long-standing challenge in multimedia. While recent image-text retrieval methods offer great promise by learning deep representations aligned across modalities, most of these methods are plagued by the issue of training with small-scale datasets covering a limited number of images with ground-truth sentences. Moreover, it is extremely expensive to create a larger dataset by annotating millions of images with sentences and may lead to a biased model. Inspired by the recent success of webly supervised learning in deep neural networks, we capitalize on readily-available web images with noisy annotations to learn robust image-text joint representation. Specifically, our main idea is to leverage web images and corresponding tags, along with fully annotated datasets, in training for learning the visual-semantic joint embedding. We propose a two-stage approach for the task that can augment a typical supervised pair-wise ranking loss based formulation with weakly-annotated web images to learn a more robust visual-semantic embedding. Experiments on two standard benchmark datasets demonstrate that our method achieves a significant performance gain in image-text retrieval compared to state-of-the-art approaches.

* ACM Multimedia 2018

View paper on

Share this with someone who'll enjoy it:

Title:Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Paper and Code