Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning

Jun 04, 2024

Wenyan Li, Jiaang Li, Rita Ramos, Raphael Tang, Desmond Elliott

Figure 1 for Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning

Figure 2 for Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning

Figure 3 for Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning

Figure 4 for Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning

Share this with someone who'll enjoy it:

Abstract:Recent advancements in retrieval-augmented models for image captioning highlight the significance of retrieving related captions for efficient, lightweight models with strong domain-transfer capabilities. While these models demonstrate the success of retrieval augmentation, retrieval models are still far from perfect in practice. Retrieved information can sometimes mislead the model generation, negatively impacting performance. In this paper, we analyze the robustness of the SmallCap retrieval-augmented captioning model. Our analysis shows that SmallCap is sensitive to tokens that appear in the majority of the retrieved captions, and integrated gradients attribution shows that those tokens are likely copied into the final caption. Given these findings, we propose to train the model by sampling retrieved captions from more diverse sets. This reduces the probability that the model learns to copy majority tokens and improves both in-domain and cross-domain performance effectively.

* 9 pages, long paper at ACL 2024

View paper on

Share this with someone who'll enjoy it:

Title:Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning

Paper and Code