Picture for Xiaogeng Liu

Xiaogeng Liu

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models

Add code
Oct 30, 2024
Figure 1 for InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Figure 2 for InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Figure 3 for InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Figure 4 for InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Viaarxiv icon

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

Add code
Oct 14, 2024
Figure 1 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 2 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 3 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 4 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Viaarxiv icon

RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process

Add code
Oct 11, 2024
Figure 1 for RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Figure 2 for RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Figure 3 for RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Figure 4 for RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Viaarxiv icon

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Add code
Jun 13, 2024
Figure 1 for MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Figure 2 for MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Figure 3 for MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Figure 4 for MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Viaarxiv icon

Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Characte

Add code
May 25, 2024
Viaarxiv icon

JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks

Add code
Apr 03, 2024
Viaarxiv icon

Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models

Add code
Mar 26, 2024
Viaarxiv icon

AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting

Add code
Mar 14, 2024
Viaarxiv icon

Automatic and Universal Prompt Injection Attacks against Large Language Models

Add code
Mar 07, 2024
Viaarxiv icon

DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions

Add code
Dec 12, 2023
Figure 1 for DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions
Figure 2 for DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions
Figure 3 for DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions
Figure 4 for DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions
Viaarxiv icon