Picture for Xiaogeng Liu

Xiaogeng Liu

AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection

Add code
Feb 18, 2025
Viaarxiv icon

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models

Add code
Oct 30, 2024
Figure 1 for InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Figure 2 for InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Figure 3 for InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Figure 4 for InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Viaarxiv icon

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

Add code
Oct 14, 2024
Figure 1 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 2 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 3 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 4 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Viaarxiv icon

RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process

Add code
Oct 11, 2024
Figure 1 for RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Figure 2 for RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Figure 3 for RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Figure 4 for RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Viaarxiv icon

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Add code
Jun 13, 2024
Figure 1 for MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Figure 2 for MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Figure 3 for MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Figure 4 for MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Viaarxiv icon

Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Characte

Add code
May 25, 2024
Viaarxiv icon

JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks

Add code
Apr 03, 2024
Viaarxiv icon

Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models

Add code
Mar 26, 2024
Figure 1 for Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
Figure 2 for Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
Figure 3 for Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
Figure 4 for Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
Viaarxiv icon

AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting

Add code
Mar 14, 2024
Viaarxiv icon

Automatic and Universal Prompt Injection Attacks against Large Language Models

Add code
Mar 07, 2024
Viaarxiv icon