Picture for Chaowei Xiao

Chaowei Xiao

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

Add code
Nov 05, 2024
Figure 1 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Figure 2 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Figure 3 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Figure 4 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Viaarxiv icon

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models

Add code
Oct 30, 2024
Figure 1 for InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Figure 2 for InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Figure 3 for InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Figure 4 for InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Viaarxiv icon

FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks

Add code
Oct 28, 2024
Viaarxiv icon

SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment

Add code
Oct 18, 2024
Viaarxiv icon

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

Add code
Oct 14, 2024
Figure 1 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 2 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 3 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 4 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Viaarxiv icon

RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process

Add code
Oct 11, 2024
Figure 1 for RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Figure 2 for RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Figure 3 for RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Figure 4 for RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Viaarxiv icon

LeanAgent: Lifelong Learning for Formal Theorem Proving

Add code
Oct 08, 2024
Viaarxiv icon

Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges

Add code
Sep 30, 2024
Figure 1 for Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
Figure 2 for Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
Viaarxiv icon

HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection

Add code
Sep 26, 2024
Viaarxiv icon

EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage

Add code
Sep 17, 2024
Figure 1 for EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Figure 2 for EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Figure 3 for EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Figure 4 for EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Viaarxiv icon