Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Sep 18, 2023

Luke Bailey, Euan Ong, Stuart Russell, Scott Emmons

Figure 1 for Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Figure 2 for Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Figure 3 for Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Figure 4 for Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Share this with someone who'll enjoy it:

Abstract:Are foundation models secure from malicious actors? In this work, we focus on the image input to a vision-language model (VLM). We discover image hijacks, adversarial images that control generative models at runtime. We introduce Behaviour Matching, a general method for creating image hijacks, and we use it to explore three types of attacks. Specific string attacks generate arbitrary output of the adversary's choice. Leak context attacks leak information from the context window into the output. Jailbreak attacks circumvent a model's safety training. We study these attacks against LLaVA, a state-of-the-art VLM based on CLIP and LLaMA-2, and find that all our attack types have above a 90% success rate. Moreover, our attacks are automated and require only small image perturbations. These findings raise serious concerns about the security of foundation models. If image hijacks are as difficult to defend against as adversarial examples in CIFAR-10, then it might be many years before a solution is found -- if it even exists.

* Project page at https://image-hijacks.github.io

View paper on

Share this with someone who'll enjoy it:

Title:Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Paper and Code