Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alejandro Salinas

Breaking Down Bias: On The Limits of Generalizable Pruning Strategies

Feb 11, 2025

Sibo Ma, Alejandro Salinas, Peter Henderson, Julian Nyarko

Abstract:We employ model pruning to examine how LLMs conceptualize racial biases, and whether a generalizable mitigation strategy for such biases appears feasible. Our analysis yields several novel insights. We find that pruning can be an effective method to reduce bias without significantly increasing anomalous model behavior. Neuron-based pruning strategies generally yield better results than approaches pruning entire attention heads. However, our results also show that the effectiveness of either approach quickly deteriorates as pruning strategies become more generalized. For instance, a model that is trained on removing racial biases in the context of financial decision-making poorly generalizes to biases in commercial transactions. Overall, our analysis suggests that racial biases are only partially represented as a general concept within language models. The other part of these biases is highly context-specific, suggesting that generalizable mitigation strategies may be of limited effectiveness. Our findings have important implications for legal frameworks surrounding AI. In particular, they suggest that an effective mitigation strategy should include the allocation of legal responsibility on those that deploy models in a specific use case.

* 28 pages, 9 figures, 1 table

Via

Access Paper or Ask Questions

What's in a Name? Auditing Large Language Models for Race and Gender Bias

Feb 29, 2024

Amit Haim, Alejandro Salinas, Julian Nyarko

Figure 1 for What's in a Name? Auditing Large Language Models for Race and Gender Bias

Figure 2 for What's in a Name? Auditing Large Language Models for Race and Gender Bias

Figure 3 for What's in a Name? Auditing Large Language Models for Race and Gender Bias

Figure 4 for What's in a Name? Auditing Large Language Models for Race and Gender Bias

Abstract:We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4. In our study, we prompt the models for advice involving a named individual across a variety of scenarios, such as during car purchase negotiations or election outcome predictions. We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women. Names associated with Black women receive the least advantageous outcomes. The biases are consistent across 42 prompt templates and several models, indicating a systemic issue rather than isolated incidents. While providing numerical, decision-relevant anchors in the prompt can successfully counteract the biases, qualitative details have inconsistent effects and may even increase disparities. Our findings underscore the importance of conducting audits at the point of LLM deployment and implementation to mitigate their potential for harm against marginalized communities.

* 34 pages, 9 tables, 11 figures

Via

Access Paper or Ask Questions