Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Romina Stella

MiTTenS: A Dataset for Evaluating Misgendering in Translation

Jan 13, 2024

Kevin Robinson, Sneha Kudugunta, Romina Stella, Sunipa Dev, Jasmijn Bastings

Figure 1 for MiTTenS: A Dataset for Evaluating Misgendering in Translation

Figure 2 for MiTTenS: A Dataset for Evaluating Misgendering in Translation

Figure 3 for MiTTenS: A Dataset for Evaluating Misgendering in Translation

Figure 4 for MiTTenS: A Dataset for Evaluating Misgendering in Translation

Abstract:Misgendering is the act of referring to someone in a way that does not reflect their gender identity. Translation systems, including foundation models capable of translation, can produce errors that result in misgendering harms. To measure the extent of such potential harms when translating into and out of English, we introduce a dataset, MiTTenS, covering 26 languages from a variety of language families and scripts, including several traditionally underpresented in digital resources. The dataset is constructed with handcrafted passages that target known failure patterns, longer synthetically generated passages, and natural passages sourced from multiple domains. We demonstrate the usefulness of the dataset by evaluating both dedicated neural machine translation systems and foundation models, and show that all systems exhibit errors resulting in misgendering harms, even in high resource languages.

* GitHub repository https://github.com/google-research-datasets/mittens

Via

Access Paper or Ask Questions