Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Adversarial Testing for Visual Grounding via Image-Aware Property Reduction

Mar 02, 2024

Zhiyuan Chang, Mingyang Li, Junjie Wang, Cheng Li, Boyu Wu, Fanjiang Xu, Qing Wang

Figure 1 for Adversarial Testing for Visual Grounding via Image-Aware Property Reduction

Figure 2 for Adversarial Testing for Visual Grounding via Image-Aware Property Reduction

Figure 3 for Adversarial Testing for Visual Grounding via Image-Aware Property Reduction

Figure 4 for Adversarial Testing for Visual Grounding via Image-Aware Property Reduction

Share this with someone who'll enjoy it:

Abstract:Due to the advantages of fusing information from various modalities, multimodal learning is gaining increasing attention. Being a fundamental task of multimodal learning, Visual Grounding (VG), aims to locate objects in images through natural language expressions. Ensuring the quality of VG models presents significant challenges due to the complex nature of the task. In the black box scenario, existing adversarial testing techniques often fail to fully exploit the potential of both modalities of information. They typically apply perturbations based solely on either the image or text information, disregarding the crucial correlation between the two modalities, which would lead to failures in test oracles or an inability to effectively challenge VG models. To this end, we propose PEELING, a text perturbation approach via image-aware property reduction for adversarial testing of the VG model. The core idea is to reduce the property-related information in the original expression meanwhile ensuring the reduced expression can still uniquely describe the original object in the image. To achieve this, PEELING first conducts the object and properties extraction and recombination to generate candidate property reduction expressions. It then selects the satisfied expressions that accurately describe the original object while ensuring no other objects in the image fulfill the expression, through querying the image with a visual understanding technique. We evaluate PEELING on the state-of-the-art VG model, i.e. OFA-VG, involving three commonly used datasets. Results show that the adversarial tests generated by PEELING achieves 21.4% in MultiModal Impact score (MMI), and outperforms state-of-the-art baselines for images and texts by 8.2%--15.1%.

* 14pages, 6 figures

View paper on

Share this with someone who'll enjoy it:

Title:Adversarial Testing for Visual Grounding via Image-Aware Property Reduction

Paper and Code