Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models

Aug 23, 2024

Yige Li, Hanxun Huang, Yunhan Zhao, Xingjun Ma, Jun Sun

Figure 1 for BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models

Figure 2 for BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models

Figure 3 for BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models

Figure 4 for BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models

Share this with someone who'll enjoy it:

Abstract:Generative Large Language Models (LLMs) have made significant strides across various tasks, but they remain vulnerable to backdoor attacks, where specific triggers in the prompt cause the LLM to generate adversary-desired responses. While most backdoor research has focused on vision or text classification tasks, backdoor attacks in text generation have been largely overlooked. In this work, we introduce \textit{BackdoorLLM}, the first comprehensive benchmark for studying backdoor attacks on LLMs. \textit{BackdoorLLM} features: 1) a repository of backdoor benchmarks with a standardized training pipeline, 2) diverse attack strategies, including data poisoning, weight poisoning, hidden state attacks, and chain-of-thought attacks, 3) extensive evaluations with over 200 experiments on 8 attacks across 7 scenarios and 6 model architectures, and 4) key insights into the effectiveness and limitations of backdoors in LLMs. We hope \textit{BackdoorLLM} will raise awareness of backdoor threats and contribute to advancing AI safety. The code is available at \url{https://github.com/bboylyg/BackdoorLLM}.

View paper on

Share this with someone who'll enjoy it:

Title:BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models

Paper and Code