Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Information-Theoretical Principled Trade-off between Jailbreakability and Stealthiness on Vision Language Models

Oct 02, 2024

Ching-Chia Kao, Chia-Mu Yu, Chun-Shien Lu, Chu-Song Chen

Figure 1 for Information-Theoretical Principled Trade-off between Jailbreakability and Stealthiness on Vision Language Models

Figure 2 for Information-Theoretical Principled Trade-off between Jailbreakability and Stealthiness on Vision Language Models

Figure 3 for Information-Theoretical Principled Trade-off between Jailbreakability and Stealthiness on Vision Language Models

Figure 4 for Information-Theoretical Principled Trade-off between Jailbreakability and Stealthiness on Vision Language Models

Share this with someone who'll enjoy it:

Abstract:In recent years, Vision-Language Models (VLMs) have demonstrated significant advancements in artificial intelligence, transforming tasks across various domains. Despite their capabilities, these models are susceptible to jailbreak attacks, which can compromise their safety and reliability. This paper explores the trade-off between jailbreakability and stealthiness in VLMs, presenting a novel algorithm to detect non-stealthy jailbreak attacks and enhance model robustness. We introduce a stealthiness-aware jailbreak attack using diffusion models, highlighting the challenge of detecting AI-generated content. Our approach leverages Fano's inequality to elucidate the relationship between attack success rates and stealthiness scores, providing an explainable framework for evaluating these threats. Our contributions aim to fortify AI systems against sophisticated attacks, ensuring their outputs remain aligned with ethical standards and user expectations.

View paper on

Share this with someone who'll enjoy it:

Title:Information-Theoretical Principled Trade-off between Jailbreakability and Stealthiness on Vision Language Models

Paper and Code