Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

Feb 27, 2025

Chenhe Gu, Jindong Gu, Andong Hua, Yao Qin

Figure 1 for Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

Figure 2 for Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

Figure 3 for Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

Figure 4 for Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

Share this with someone who'll enjoy it:

Abstract:Multimodal Large Language Models (MLLMs), built upon LLMs, have recently gained attention for their capabilities in image recognition and understanding. However, while MLLMs are vulnerable to adversarial attacks, the transferability of these attacks across different models remains limited, especially under targeted attack setting. Existing methods primarily focus on vision-specific perturbations but struggle with the complex nature of vision-language modality alignment. In this work, we introduce the Dynamic Vision-Language Alignment (DynVLA) Attack, a novel approach that injects dynamic perturbations into the vision-language connector to enhance generalization across diverse vision-language alignment of different models. Our experimental results show that DynVLA significantly improves the transferability of adversarial examples across various MLLMs, including BLIP2, InstructBLIP, MiniGPT4, LLaVA, and closed-source models such as Gemini.

* arXiv admin note: text overlap with arXiv:2403.09766

View paper on

Share this with someone who'll enjoy it:

Title:Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

Paper and Code