Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Archana Ahlawat

De-amplifying Bias from Differential Privacy in Language Model Fine-tuning

Feb 07, 2024

Sanjari Srivastava, Piotr Mardziel, Zhikhun Zhang, Archana Ahlawat, Anupam Datta, John C Mitchell

Figure 1 for De-amplifying Bias from Differential Privacy in Language Model Fine-tuning

Figure 2 for De-amplifying Bias from Differential Privacy in Language Model Fine-tuning

Figure 3 for De-amplifying Bias from Differential Privacy in Language Model Fine-tuning

Figure 4 for De-amplifying Bias from Differential Privacy in Language Model Fine-tuning

Abstract:Fairness and privacy are two important values machine learning (ML) practitioners often seek to operationalize in models. Fairness aims to reduce model bias for social/demographic sub-groups. Privacy via differential privacy (DP) mechanisms, on the other hand, limits the impact of any individual's training data on the resulting model. The trade-offs between privacy and fairness goals of trustworthy ML pose a challenge to those wishing to address both. We show that DP amplifies gender, racial, and religious bias when fine-tuning large language models (LLMs), producing models more biased than ones fine-tuned without DP. We find the cause of the amplification to be a disparity in convergence of gradients across sub-groups. Through the case of binary gender bias, we demonstrate that Counterfactual Data Augmentation (CDA), a known method for addressing bias, also mitigates bias amplification by DP. As a consequence, DP and CDA together can be used to fine-tune models while maintaining both fairness and privacy.

Via

Access Paper or Ask Questions