Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bilal Porgali

The Casual Conversations v2 Dataset

Mar 08, 2023

Bilal Porgali, Vítor Albiero, Jordan Ryda, Cristian Canton Ferrer, Caner Hazirbas

Figure 1 for The Casual Conversations v2 Dataset

Figure 2 for The Casual Conversations v2 Dataset

Figure 3 for The Casual Conversations v2 Dataset

Figure 4 for The Casual Conversations v2 Dataset

Abstract:This paper introduces a new large consent-driven dataset aimed at assisting in the evaluation of algorithmic bias and robustness of computer vision and audio speech models in regards to 11 attributes that are self-provided or labeled by trained annotators. The dataset includes 26,467 videos of 5,567 unique paid participants, with an average of almost 5 videos per person, recorded in Brazil, India, Indonesia, Mexico, Vietnam, Philippines, and the USA, representing diverse demographic characteristics. The participants agreed for their data to be used in assessing fairness of AI models and provided self-reported age, gender, language/dialect, disability status, physical adornments, physical attributes and geo-location information, while trained annotators labeled apparent skin tone using the Fitzpatrick Skin Type and Monk Skin Tone scales, and voice timbre. Annotators also labeled for different recording setups and per-second activity annotations.

Via

Access Paper or Ask Questions

Casual Conversations v2: Designing a large consent-driven dataset to measure algorithmic bias and robustness

Nov 10, 2022

Caner Hazirbas, Yejin Bang, Tiezheng Yu, Parisa Assar, Bilal Porgali, Vítor Albiero, Stefan Hermanek, Jacqueline Pan, Emily McReynolds, Miranda Bogen(+2 more)

Figure 1 for Casual Conversations v2: Designing a large consent-driven dataset to measure algorithmic bias and robustness

Figure 2 for Casual Conversations v2: Designing a large consent-driven dataset to measure algorithmic bias and robustness

Figure 3 for Casual Conversations v2: Designing a large consent-driven dataset to measure algorithmic bias and robustness

Figure 4 for Casual Conversations v2: Designing a large consent-driven dataset to measure algorithmic bias and robustness

Abstract:Developing robust and fair AI systems require datasets with comprehensive set of labels that can help ensure the validity and legitimacy of relevant measurements. Recent efforts, therefore, focus on collecting person-related datasets that have carefully selected labels, including sensitive characteristics, and consent forms in place to use those attributes for model testing and development. Responsible data collection involves several stages, including but not limited to determining use-case scenarios, selecting categories (annotations) such that the data are fit for the purpose of measuring algorithmic bias for subgroups and most importantly ensure that the selected categories/subcategories are robust to regional diversities and inclusive of as many subgroups as possible. Meta, in a continuation of our efforts to measure AI algorithmic bias and robustness (https://ai.facebook.com/blog/shedding-light-on-fairness-in-ai-with-a-new-data-set), is working on collecting a large consent-driven dataset with a comprehensive list of categories. This paper describes our proposed design of such categories and subcategories for Casual Conversations v2.

Via

Access Paper or Ask Questions