Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tsung-Yin Hsieh

(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Jul 24, 2023

Eugene Bagdasaryan, Tsung-Yin Hsieh, Ben Nassi, Vitaly Shmatikov

Figure 1 for (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Figure 2 for (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Figure 3 for (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Figure 4 for (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Abstract:We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker's instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.

Via

Access Paper or Ask Questions