Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation

Oct 14, 2022

Tianxiang Sun, Junliang He, Xipeng Qiu, Xuanjing Huang

Figure 1 for BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation

Figure 2 for BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation

Figure 3 for BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation

Figure 4 for BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation

Share this with someone who'll enjoy it:

Abstract:Automatic evaluation metrics are crucial to the development of generative systems. In recent years, pre-trained language model (PLM) based metrics, such as BERTScore, have been commonly adopted in various generation tasks. However, it has been demonstrated that PLMs encode a range of stereotypical societal biases, leading to a concern on the fairness of PLMs as metrics. To that end, this work presents the first systematic study on the social bias in PLM-based metrics. We demonstrate that popular PLM-based metrics exhibit significantly higher social bias than traditional metrics on 6 sensitive attributes, namely race, gender, religion, physical appearance, age, and socioeconomic status. In-depth analysis suggests that choosing paradigms (matching, regression, or generation) of the metric has a greater impact on fairness than choosing PLMs. In addition, we develop debiasing adapters that are injected into PLM layers, mitigating bias in PLM-based metrics while retaining high performance for evaluating text generation.

* Accepted to EMNLP 2022 (main conference). Data and code are available at https://github.com/txsun1997/Metric-Fairness

View paper on

Share this with someone who'll enjoy it:

Title:BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation

Paper and Code