Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Retrieval-Augmented Transformer for Image Captioning

Jul 26, 2022

Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Figure 1 for Retrieval-Augmented Transformer for Image Captioning

Figure 2 for Retrieval-Augmented Transformer for Image Captioning

Figure 3 for Retrieval-Augmented Transformer for Image Captioning

Figure 4 for Retrieval-Augmented Transformer for Image Captioning

Share this with someone who'll enjoy it:

Abstract:Image captioning models aim at connecting Vision and Language by providing natural language descriptions of input images. In the past few years, the task has been tackled by learning parametric models and proposing visual feature extraction advancements or by modeling better multi-modal connections. In this paper, we investigate the development of an image captioning approach with a kNN memory, with which knowledge can be retrieved from an external corpus to aid the generation process. Our architecture combines a knowledge retriever based on visual similarities, a differentiable encoder, and a kNN-augmented attention layer to predict tokens based on the past context and on text retrieved from the external memory. Experimental results, conducted on the COCO dataset, demonstrate that employing an explicit external memory can aid the generation process and increase caption quality. Our work opens up new avenues for improving image captioning models at larger scale.

* CBMI 2022

View paper on

Share this with someone who'll enjoy it:

Title:Retrieval-Augmented Transformer for Image Captioning

Paper and Code