Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ian D. Kivlichan

CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation

Jun 09, 2022

Mark Diaz, Ian D. Kivlichan, Rachel Rosen, Dylan K. Baker, Razvan Amironesei, Vinodkumar Prabhakaran, Emily Denton

Abstract:Human annotated data plays a crucial role in machine learning (ML) research and development. However, the ethical considerations around the processes and decisions that go into dataset annotation have not received nearly enough attention. In this paper, we survey an array of literature that provides insights into ethical considerations around crowdsourced dataset annotation. We synthesize these insights, and lay out the challenges in this space along two layers: (1) who the annotator is, and how the annotators' lived experiences can impact their annotations, and (2) the relationship between the annotators and the crowdsourcing platforms, and what that relationship affords them. Finally, we introduce a novel framework, CrowdWorkSheets, for dataset developers to facilitate transparent documentation of key decisions points at various stages of the data annotation pipeline: task formulation, selection of annotators, platform and infrastructure choices, dataset analysis and evaluation, and dataset release and maintenance.

* 11 pages, Accepted at 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT). arXiv admin note: text overlap with arXiv:2112.04554

Via

Access Paper or Ask Questions

Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation

Jul 09, 2021

Ian D. Kivlichan, Zi Lin, Jeremiah Liu, Lucy Vasserman

Figure 1 for Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation

Figure 2 for Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation

Figure 3 for Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation

Figure 4 for Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation

Abstract:Content moderation is often performed by a collaboration between humans and machine learning models. However, it is not well understood how to design the collaborative process so as to maximize the combined moderator-model system performance. This work presents a rigorous study of this problem, focusing on an approach that incorporates model uncertainty into the collaborative process. First, we introduce principled metrics to describe the performance of the collaborative system under capacity constraints on the human moderator, quantifying how efficiently the combined system utilizes human decisions. Using these metrics, we conduct a large benchmark study evaluating the performance of state-of-the-art uncertainty models under different collaborative review strategies. We find that an uncertainty-based strategy consistently outperforms the widely used strategy based on toxicity scores, and moreover that the choice of review strategy drastically changes the overall system performance. Our results demonstrate the importance of rigorous metrics for understanding and developing effective moderator-model systems for content moderation, as well as the utility of uncertainty estimation in this domain.

* WOAH 2021

Via

Access Paper or Ask Questions