Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Object Counts! Bringing Explicit Detections Back into Image Captioning

Apr 23, 2018

Josiah Wang, Pranava Madhyastha, Lucia Specia

Figure 1 for Object Counts! Bringing Explicit Detections Back into Image Captioning

Figure 2 for Object Counts! Bringing Explicit Detections Back into Image Captioning

Figure 3 for Object Counts! Bringing Explicit Detections Back into Image Captioning

Figure 4 for Object Counts! Bringing Explicit Detections Back into Image Captioning

Share this with someone who'll enjoy it:

Abstract:The use of explicit object detectors as an intermediate step to image captioning - which used to constitute an essential stage in early work - is often bypassed in the currently dominant end-to-end approaches, where the language model is conditioned directly on a mid-level image embedding. We argue that explicit detections provide rich semantic information, and can thus be used as an interpretable representation to better understand why end-to-end image captioning systems work well. We provide an in-depth analysis of end-to-end image captioning by exploring a variety of cues that can be derived from such object detections. Our study reveals that end-to-end image captioning systems rely on matching image representations to generate captions, and that encoding the frequency, size and position of objects are complementary and all play a role in forming a good image representation. It also reveals that different object categories contribute in different ways towards image captioning.

* Please cite: In Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2018)

View paper on

Share this with someone who'll enjoy it:

Title:Object Counts! Bringing Explicit Detections Back into Image Captioning

Paper and Code