Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense

Nov 13, 2024

Yangyang Guo, Fangkai Jiao, Liqiang Nie, Mohan Kankanhalli

Figure 1 for The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense

Figure 2 for The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense

Figure 3 for The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense

Figure 4 for The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense

Share this with someone who'll enjoy it:

Abstract:The vulnerability of Vision Large Language Models (VLLMs) to jailbreak attacks appears as no surprise. However, recent defense mechanisms against these attacks have reached near-saturation performance on benchmarks, often with minimal effort. This simultaneous high performance in both attack and defense presents a perplexing paradox. Resolving it is critical for advancing the development of trustworthy models. To address this research gap, we first investigate why VLLMs are prone to these attacks. We then make a key observation: existing defense mechanisms suffer from an \textbf{over-prudence} problem, resulting in unexpected abstention even in the presence of benign inputs. Additionally, we find that the two representative evaluation methods for jailbreak often exhibit chance agreement. This limitation makes it potentially misleading when evaluating attack strategies or defense mechanisms. Beyond these empirical observations, our another contribution in this work is to repurpose the guardrails of LLMs on the shelf, as an effective alternative detector prior to VLLM response. We believe these findings offer useful insights to rethink the foundational development of VLLM safety with respect to benchmark datasets, evaluation methods, and defense strategies.

View paper on

Share this with someone who'll enjoy it:

Title:The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense

Paper and Code