Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Improving Image Captioning with Better Use of Captions

Jun 21, 2020

Zhan Shi, Xu Zhou, Xipeng Qiu, Xiaodan Zhu

Figure 1 for Improving Image Captioning with Better Use of Captions

Figure 2 for Improving Image Captioning with Better Use of Captions

Figure 3 for Improving Image Captioning with Better Use of Captions

Figure 4 for Improving Image Captioning with Better Use of Captions

Share this with someone who'll enjoy it:

Abstract:Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation. Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning. The representation is then enhanced with neighbouring and contextual nodes with their textual and visual features. During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences. We perform extensive experiments on the MSCOCO dataset, showing that the proposed framework significantly outperforms the baselines, resulting in the state-of-the-art performance under a wide range of evaluation metrics.

* ACL 2020

View paper on

Share this with someone who'll enjoy it:

Title:Improving Image Captioning with Better Use of Captions

Paper and Code