Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Spencer Rarrick

GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages

Feb 22, 2024

Spencer Rarrick, Ranjita Naik, Sundar Poudel, Vishal Chowdhary

Figure 1 for GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages

Figure 2 for GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages

Figure 3 for GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages

Figure 4 for GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages

Abstract:Neural Machine Translation (NMT) continues to improve in quality and adoption, yet the inadvertent perpetuation of gender bias remains a significant concern. Despite numerous studies on gender bias in translations into English from weakly gendered-languages, there are no benchmarks for evaluating this phenomenon or for assessing mitigation strategies. To address this gap, we introduce GATE X-E, an extension to the GATE (Rarrick et al., 2023) corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English. Each translation is accompanied by feminine, masculine, and neutral variants. The dataset, which contains between 1250 and 1850 instances for each of the four language pairs, features natural sentences with a wide range of sentence lengths and domains, challenging translation rewriters on various linguistic phenomena. Additionally, we present a translation gender rewriting solution built with GPT-4 and use GATE X-E to evaluate it. We open source our contributions to encourage further research on gender debiasing.

* arXiv admin note: substantial text overlap with arXiv:2311.08836

Via

Access Paper or Ask Questions

Reducing Gender Bias in Machine Translation through Counterfactual Data Generation

Nov 27, 2023

Ranjita Naik, Spencer Rarrick, Vishal Chowdhary

Figure 1 for Reducing Gender Bias in Machine Translation through Counterfactual Data Generation

Figure 2 for Reducing Gender Bias in Machine Translation through Counterfactual Data Generation

Figure 3 for Reducing Gender Bias in Machine Translation through Counterfactual Data Generation

Figure 4 for Reducing Gender Bias in Machine Translation through Counterfactual Data Generation

Abstract:Recent advances in neural methods have led to substantial improvement in the quality of Neural Machine Translation (NMT) systems. However, these systems frequently produce translations with inaccurate gender (Stanovsky et al., 2019), which can be traced to bias in training data. Saunders and Byrne (2020) tackle this problem with a handcrafted dataset containing balanced gendered profession words. By using this data to fine-tune an existing NMT model, they show that gender bias can be significantly mitigated, albeit at the expense of translation quality due to catastrophic forgetting. They recover some of the lost quality with modified training objectives or additional models at inference. We find, however, that simply supplementing the handcrafted dataset with a random sample from the base model training corpus is enough to significantly reduce the catastrophic forgetting. We also propose a novel domain-adaptation technique that leverages in-domain data created with the counterfactual data generation techniques proposed by Zmigrod et al. (2019) to further improve accuracy on the WinoMT challenge test set without significant loss in translation quality. We show its effectiveness in NMT systems from English into three morphologically rich languages French, Spanish, and Italian. The relevant dataset and code will be available at Github.

Via

Access Paper or Ask Questions

Evaluating Gender Bias in the Translation of Gender-Neutral Languages into English

Nov 15, 2023

Spencer Rarrick, Ranjita Naik, Sundar Poudel, Vishal Chowdhary

Figure 1 for Evaluating Gender Bias in the Translation of Gender-Neutral Languages into English

Figure 2 for Evaluating Gender Bias in the Translation of Gender-Neutral Languages into English

Figure 3 for Evaluating Gender Bias in the Translation of Gender-Neutral Languages into English

Figure 4 for Evaluating Gender Bias in the Translation of Gender-Neutral Languages into English

Abstract:Machine Translation (MT) continues to improve in quality and adoption, yet the inadvertent perpetuation of gender bias remains a significant concern. Despite numerous studies into gender bias in translations from gender-neutral languages such as Turkish into more strongly gendered languages like English, there are no benchmarks for evaluating this phenomenon or for assessing mitigation strategies. To address this gap, we introduce GATE X-E, an extension to the GATE (Rarrick et al., 2023) corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English. Each translation is accompanied by feminine, masculine, and neutral variants for each possible gender interpretation. The dataset, which contains between 1250 and 1850 instances for each of the four language pairs, features natural sentences with a wide range of sentence lengths and domains, challenging translation rewriters on various linguistic phenomena. Additionally, we present an English gender rewriting solution built on GPT-3.5 Turbo and use GATE X-E to evaluate it. We open source our contributions to encourage further research on gender debiasing.

Via

Access Paper or Ask Questions

GATE: A Challenge Set for Gender-Ambiguous Translation Examples

Mar 07, 2023

Spencer Rarrick, Ranjita Naik, Varun Mathur, Sundar Poudel, Vishal Chowdhary

Abstract:Although recent years have brought significant progress in improving translation of unambiguously gendered sentences, translation of ambiguously gendered input remains relatively unexplored. When source gender is ambiguous, machine translation models typically default to stereotypical gender roles, perpetuating harmful bias. Recent work has led to the development of "gender rewriters" that generate alternative gender translations on such ambiguous inputs, but such systems are plagued by poor linguistic coverage. To encourage better performance on this task we present and release GATE, a linguistically diverse corpus of gender-ambiguous source sentences along with multiple alternative target language translations. We also provide tools for evaluation and system analysis when using GATE and use them to evaluate our translation rewriter system.

Via

Access Paper or Ask Questions