Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Dec 07, 2023

Shuli Jiang, Swanand Ravindra Kadhe, Yi Zhou, Ling Cai, Nathalie Baracaldo

Figure 1 for Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Figure 2 for Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Figure 3 for Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Figure 4 for Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Share this with someone who'll enjoy it:

Abstract:Growing applications of large language models (LLMs) trained by a third party raise serious concerns on the security vulnerability of LLMs.It has been demonstrated that malicious actors can covertly exploit these vulnerabilities in LLMs through poisoning attacks aimed at generating undesirable outputs. While poisoning attacks have received significant attention in the image domain (e.g., object detection), and classification tasks, their implications for generative models, particularly in the realm of natural language generation (NLG) tasks, remain poorly understood. To bridge this gap, we perform a comprehensive exploration of various poisoning techniques to assess their effectiveness across a range of generative tasks. Furthermore, we introduce a range of metrics designed to quantify the success and stealthiness of poisoning attacks specifically tailored to NLG tasks. Through extensive experiments on multiple NLG tasks, LLMs and datasets, we show that it is possible to successfully poison an LLM during the fine-tuning stage using as little as 1\% of the total tuning data samples. Our paper presents the first systematic approach to comprehend poisoning attacks targeting NLG tasks considering a wide range of triggers and attack settings. We hope our findings will assist the AI security community in devising appropriate defenses against such threats.

* 19 pages, 6 figures. Published at NeurIPS 2023 Workshop on Backdoors in Deep Learning: The Good, the Bad, and the Ugly

View paper on

Share this with someone who'll enjoy it:

Title:Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Paper and Code