Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:IDEATOR: Jailbreaking VLMs Using VLMs

Oct 29, 2024

Ruofan Wang, Bo Wang, Xingjun Ma, Yu-Gang Jiang

Figure 1 for IDEATOR: Jailbreaking VLMs Using VLMs

Figure 2 for IDEATOR: Jailbreaking VLMs Using VLMs

Figure 3 for IDEATOR: Jailbreaking VLMs Using VLMs

Figure 4 for IDEATOR: Jailbreaking VLMs Using VLMs

Share this with someone who'll enjoy it:

Abstract:As large Vision-Language Models (VLMs) continue to gain prominence, ensuring their safety deployment in real-world applications has become a critical concern. Recently, significant research efforts have focused on evaluating the robustness of VLMs against jailbreak attacks. Due to challenges in obtaining multi-modal data, current studies often assess VLM robustness by generating adversarial or query-relevant images based on harmful text datasets. However, the jailbreak images generated this way exhibit certain limitations. Adversarial images require white-box access to the target VLM and are relatively easy to defend against, while query-relevant images must be linked to the target harmful content, limiting their diversity and effectiveness. In this paper, we propose a novel jailbreak method named IDEATOR, which autonomously generates malicious image-text pairs for black-box jailbreak attacks. IDEATOR is a VLM-based approach inspired by our conjecture that a VLM itself might be a powerful red team model for generating jailbreak prompts. Specifically, IDEATOR employs a VLM to generate jailbreak texts while leveraging a state-of-the-art diffusion model to create corresponding jailbreak images. Extensive experiments demonstrate the high effectiveness and transferability of IDEATOR. It successfully jailbreaks MiniGPT-4 with a 94% success rate and transfers seamlessly to LLaVA and InstructBLIP, achieving high success rates of 82% and 88%, respectively. IDEATOR uncovers previously unrecognized vulnerabilities in VLMs, calling for advanced safety mechanisms.

View paper on

Share this with someone who'll enjoy it:

Title:IDEATOR: Jailbreaking VLMs Using VLMs

Paper and Code