Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rob Chew

Correcting Annotator Bias in Training Data: Population-Aligned Instance Replication (PAIR)

Jan 12, 2025

Stephanie Eckman, Bolei Ma, Christoph Kern, Rob Chew, Barbara Plank, Frauke Kreuter

Abstract:Models trained on crowdsourced labels may not reflect broader population views when annotator pools are not representative. Since collecting representative labels is challenging, we propose Population-Aligned Instance Replication (PAIR), a method to address this bias through statistical adjustment. Using a simulation study of hate speech and offensive language detection, we create two types of annotators with different labeling tendencies and generate datasets with varying proportions of the types. Models trained on unbalanced annotator pools show poor calibration compared to those trained on representative data. However, PAIR, which duplicates labels from underrepresented annotator groups to match population proportions, significantly reduces bias without requiring new data collection. These results suggest statistical techniques from survey research can help align model training with target populations even when representative annotator pools are unavailable. We conclude with three practical recommendations for improving training data quality.

Via

Access Paper or Ask Questions

Annotation Sensitivity: Training Data Collection Methods Affect Model Performance

Nov 23, 2023

Christoph Kern, Stephanie Eckman, Jacob Beck, Rob Chew, Bolei Ma, Frauke Kreuter

Figure 1 for Annotation Sensitivity: Training Data Collection Methods Affect Model Performance

Figure 2 for Annotation Sensitivity: Training Data Collection Methods Affect Model Performance

Figure 3 for Annotation Sensitivity: Training Data Collection Methods Affect Model Performance

Figure 4 for Annotation Sensitivity: Training Data Collection Methods Affect Model Performance

Abstract:When training data are collected from human annotators, the design of the annotation instrument, the instructions given to annotators, the characteristics of the annotators, and their interactions can impact training data. This study demonstrates that design choices made when creating an annotation instrument also impact the models trained on the resulting annotations. We introduce the term annotation sensitivity to refer to the impact of annotation data collection methods on the annotations themselves and on downstream model performance and predictions. We collect annotations of hate speech and offensive language in five experimental conditions of an annotation instrument, randomly assigning annotators to conditions. We then fine-tune BERT models on each of the five resulting datasets and evaluate model performance on a holdout portion of each condition. We find considerable differences between the conditions for 1) the share of hate speech/offensive language annotations, 2) model performance, 3) model predictions, and 4) model learning curves. Our results emphasize the crucial role played by the annotation instrument which has received little attention in the machine learning literature. We call for additional research into how and why the instrument impacts the annotations to inform the development of best practices in instrument design.

* EMNLP 2023 Findings

Via

Access Paper or Ask Questions

SMART: An Open Source Data Labeling Platform for Supervised Learning

Dec 11, 2018

Rob Chew, Michael Wenger, Caroline Kery, Jason Nance, Keith Richards, Emily Hadley, Peter Baumgartner

Figure 1 for SMART: An Open Source Data Labeling Platform for Supervised Learning

Abstract:SMART is an open source web application designed to help data scientists and research teams efficiently build labeled training data sets for supervised machine learning tasks. SMART provides users with an intuitive interface for creating labeled data sets, supports active learning to help reduce the required amount of labeled data, and incorporates inter-rater reliability statistics to provide insight into label quality. SMART is designed to be platform agnostic and easily deployable to meet the needs of as many different research teams as possible. The project website contains links to the code repository and extensive user documentation.

* 5 pages, 1 figure

Via

Access Paper or Ask Questions