Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

Dec 05, 2023

Brian Gordon, Yonatan Bitton, Yonatan Shafir, Roopal Garg, Xi Chen, Dani Lischinski, Daniel Cohen-Or, Idan Szpektor

Figure 1 for Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

Figure 2 for Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

Figure 3 for Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

Figure 4 for Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

Share this with someone who'll enjoy it:

Abstract:While existing image-text alignment models reach high quality binary assessments, they fall short of pinpointing the exact source of misalignment. In this paper, we present a method to provide detailed textual and visual explanation of detected misalignments between text-image pairs. We leverage large language models and visual grounding models to automatically construct a training set that holds plausible misaligned captions for a given image and corresponding textual explanations and visual indicators. We also publish a new human curated test set comprising ground-truth textual and visual misalignment annotations. Empirical results show that fine-tuning vision language models on our training set enables them to articulate misalignments and visually indicate them within images, outperforming strong baselines both on the binary alignment classification and the explanation generation tasks. Our method code and human curated test set are available at: https://mismatch-quest.github.io/

View paper on

Share this with someone who'll enjoy it:

Title:Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

Paper and Code