Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Feb 14, 2023

Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

Figure 1 for Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Figure 2 for Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Figure 3 for Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Figure 4 for Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Share this with someone who'll enjoy it:

Abstract:Pretrained large language models have become indispensable for solving various natural language processing (NLP) tasks. However, safely deploying them in real world applications is challenging because they generate toxic content. To address this challenge, we propose two novel pretraining data augmentation strategies that significantly reduce model toxicity without compromising its utility. Our two strategies are: (1) MEDA: adds raw toxicity score as meta-data to the pretraining samples, and (2) INST: adds instructions to those samples indicating their toxicity. Our results indicate that our best performing strategy (INST) substantially reduces the toxicity probability up to 61% while preserving the accuracy on five benchmark NLP tasks as well as improving AUC scores on four bias detection tasks by 1.3%. We also demonstrate the generalizability of our techniques by scaling the number of training samples and the number of model parameters.

* This paper will be presented at EACL 2023

View paper on

Share this with someone who'll enjoy it:

Title:Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Paper and Code