Abstract:The unequal representation of different groups in a sample population can lead to discrimination of minority groups when machine learning models make automated decisions. To address these issues, fairness-aware machine learning jointly optimizes two (or more) metrics aiming at predictive effectiveness and low unfairness. However, the inherent under-representation of minorities in the data makes the disparate treatment of subpopulations less noticeable and difficult to deal with during learning. In this paper, we propose a novel adversarial reweighting method to address such \emph{representation bias}. To balance the data distribution between the majority and the minority groups, our approach deemphasizes samples from the majority group. To minimize empirical risk, our method prefers samples from the majority group that are close to the minority group as evaluated by the Wasserstein distance. Our theoretical analysis shows the effectiveness of our adversarial reweighting approach. Experiments demonstrate that our approach mitigates bias without sacrificing classification accuracy, outperforming related state-of-the-art methods on image and tabular benchmark datasets.
Abstract:Computer Vision (CV) has achieved remarkable results, outperforming humans in several tasks. Nonetheless, it may result in major discrimination if not dealt with proper care. CV systems highly depend on the data they are fed with and can learn and amplify biases within such data. Thus, both the problems of understanding and discovering biases are of utmost importance. Yet, to date there is no comprehensive survey on bias in visual datasets. To this end, this work aims to: i) describe the biases that can affect visual datasets; ii) review the literature on methods for bias discovery and quantification in visual datasets; iii) discuss existing attempts to collect bias-aware visual datasets. A key conclusion of our study is that the problem of bias discovery and quantification in visual datasets is still open and there is room for improvement in terms of both methods and the range of biases that can be addressed; moreover, there is no such thing as a bias-free dataset, so scientists and practitioners must become aware of the biases in their datasets and make them explicit. To this end, we propose a checklist that can be used to spot different types of bias during visual dataset collection.