Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Large-Scale Bidirectional Training for Zero-Shot Image Captioning

Nov 15, 2022

Taehoon Kim, Mark Marsden, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Alessandra Sala, Seung Hwan Kim

Figure 1 for Large-Scale Bidirectional Training for Zero-Shot Image Captioning

Figure 2 for Large-Scale Bidirectional Training for Zero-Shot Image Captioning

Figure 3 for Large-Scale Bidirectional Training for Zero-Shot Image Captioning

Figure 4 for Large-Scale Bidirectional Training for Zero-Shot Image Captioning

Share this with someone who'll enjoy it:

Abstract:When trained on large-scale datasets, image captioning models can understand the content of images from a general domain but often fail to generate accurate, detailed captions. To improve performance, pretraining-and-finetuning has been a key strategy for image captioning. However, we find that large-scale bidirectional training between image and text enables zero-shot image captioning. In this paper, we introduce Bidirectional Image Text Training in largER Scale, BITTERS, an efficient training and inference framework for zero-shot image captioning. We also propose a new evaluation benchmark which comprises of high quality datasets and an extensive set of metrics to properly evaluate zero-shot captioning accuracy and societal bias. We additionally provide an efficient finetuning approach for keyword extraction. We show that careful selection of large-scale training set and model architecture is the key to achieving zero-shot image captioning.

* Arxiv Preprint. Work in progress

View paper on

Share this with someone who'll enjoy it:

Title:Large-Scale Bidirectional Training for Zero-Shot Image Captioning

Paper and Code