Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Feb 26, 2024

Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster(+2 more)

Figure 1 for Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Figure 2 for Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Figure 3 for Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Figure 4 for Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Share this with someone who'll enjoy it:

Abstract:As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance. Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations. To address these limitations, we present Rainbow Teaming, a novel approach for producing a diverse collection of adversarial prompts. Rainbow Teaming casts adversarial prompt generation as a quality-diversity problem, and uses open-ended search to generate prompts that are both effective and diverse. It can uncover a model's vulnerabilities across a broad range of domains including, in this paper, safety, question answering, and cybersecurity. We also demonstrate that fine-tuning on synthetic data generated by Rainbow Teaming improves the safety of state-of-the-art LLMs without hurting their general capabilities and helpfulness, paving the path to open-ended self-improvement.

View paper on

Share this with someone who'll enjoy it:

Title:Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Paper and Code