Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

Jul 05, 2022

Ziyan Yang, Kushal Kafle, Franck Dernoncourt, Vicente Ordonez

Figure 1 for Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

Figure 2 for Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

Figure 3 for Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

Figure 4 for Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

Share this with someone who'll enjoy it:

Abstract:We propose a margin-based loss for vision-language model pretraining that encourages gradient-based explanations that are consistent with region-level annotations. We refer to this objective as Attention Mask Consistency (AMC) and demonstrate that it produces superior visual grounding performance compared to models that rely instead on region-level annotations for explicitly training an object detector such as Faster R-CNN. AMC works by encouraging gradient-based explanation masks that focus their attention scores mostly within annotated regions of interest for images that contain such annotations. Particularly, a model trained with AMC on top of standard vision-language modeling objectives obtains a state-of-the-art accuracy of 86.59% in the Flickr30k visual grounding benchmark, an absolute improvement of 5.48% when compared to the best previous model. Our approach also performs exceedingly well on established benchmarks for referring expression comprehension and offers the added benefit by design of gradient-based explanations that better align with human annotations.

View paper on

Share this with someone who'll enjoy it:

Title:Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

Paper and Code