Montreal AI Ethics Institute
Abstract:We present Shades-of-NULL, a benchmark for responsible missing value imputation. Our benchmark includes state-of-the-art imputation techniques, and embeds them into the machine learning development lifecycle. We model realistic missingness scenarios that go beyond Rubin's classic Missing Completely at Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR), to include multi-mechanism missingness (when different missingness patterns co-exist in the data) and missingness shift (when the missingness mechanism changes between training and test). Another key novelty of our work is that we evaluate imputers holistically, based on the predictive performance, fairness and stability of the models that are trained and tested on the data they produce. We use Shades-of-NULL to conduct a large-scale empirical study involving 20,952 experimental pipelines, and find that, while there is no single best-performing imputation approach for all missingness types, interesting performance patterns do emerge when comparing imputer performance in simpler vs. more complex missingness scenarios. Further, while predictive performance, fairness and stability can be seen as orthogonal, we identify trade-offs among them that arise due to the combination of missingness scenario, the choice of an imputer, and the architecture of the model trained on the data post-imputation. We make Shades-of-NULL publicly available, and hope to enable researchers to comprehensively and rigorously evaluate new missing value imputation methods on a wide range of evaluation metrics, in plausible and socially meaningful missingness scenarios.
Abstract:In this paper we revisit the bias-variance decomposition of model error from the perspective of designing a fair classifier: we are motivated by the widely held socio-technical belief that noise variance in large datasets in social domains tracks demographic characteristics such as gender, race, disability, etc. We propose a conditional-iid (ciid) model built from group-specific classifiers that seeks to improve on the trade-offs made by a single model (iid setting). We theoretically analyze the bias-variance decomposition of different models in the Gaussian Mixture Model, and then empirically test our setup on the COMPAS and folktables datasets. We instantiate the ciid model with two procedures that improve "fairness" by conditioning out undesirable effects: first, by conditioning directly on sensitive attributes, and second, by clustering samples into groups and conditioning on cluster membership (blind to protected group membership). Our analysis suggests that there might be principled procedures and concrete real-world use cases under which conditional models are preferred, and our striking empirical results strongly indicate that non-iid settings, such as the ciid setting proposed here, might be more suitable for big data applications in social contexts.
Abstract:The error of an estimator can be decomposed into a (statistical) bias term, a variance term, and an irreducible noise term. When we do bias analysis, formally we are asking the question: "how good are the predictions?" The role of bias in the error decomposition is clear: if we trust the labels/targets, then we would want the estimator to have as low bias as possible, in order to minimize error. Fair machine learning is concerned with the question: "Are the predictions equally good for different demographic/social groups?" This has naturally led to a variety of fairness metrics that compare some measure of statistical bias on subsets corresponding to socially privileged and socially disadvantaged groups. In this paper we propose a new family of performance measures based on group-wise parity in variance. We demonstrate when group-wise statistical bias analysis gives an incomplete picture, and what group-wise variance analysis can tell us in settings that differ in the magnitude of statistical bias. We develop and release an open-source library that reconciles uncertainty quantification techniques with fairness analysis, and use it to conduct an extensive empirical analysis of our variance-based fairness measures on standard benchmarks.
Abstract:In this work we use Equal Oppportunity (EO) doctrines from political philosophy to make explicit the normative judgements embedded in different conceptions of algorithmic fairness. We contrast formal EO approaches that narrowly focus on fair contests at discrete decision points, with substantive EO doctrines that look at people's fair life chances more holistically over the course of a lifetime. We use this taxonomy to provide a moral interpretation of the impossibility results as the incompatibility between different conceptions of a fair contest -- foward-looking versus backward-looking -- when people do not have fair life chances. We use this result to motivate substantive conceptions of algorithmic fairness and outline two plausible procedures based on the luck-egalitarian doctrine of EO, and Rawls's principle of fair equality of opportunity.
Abstract:Recent interest in codifying fairness in Automated Decision Systems (ADS) has resulted in a wide range of formulations of what it means for an algorithmic system to be fair. Most of these propositions are inspired by, but inadequately grounded in, political philosophy scholarship. This paper aims to correct that deficit. We introduce a taxonomy of fairness ideals using doctrines of Equality of Opportunity (EOP) from political philosophy, clarifying their conceptions in philosophy and the proposed codification in fair machine learning. We arrange these fairness ideals onto an EOP spectrum, which serves as a useful frame to guide the design of a fair ADS in a given context. We use our fairness-as-EOP framework to re-interpret the impossibility results from a philosophical perspective, as the in-compatibility between different value systems, and demonstrate the utility of the framework with several real-world and hypothetical examples. Through our EOP-framework we hope to answer what it means for an ADS to be fair from a moral and political philosophy standpoint, and to pave the way for similar scholarship from ethics and legal experts.
Abstract:The 3rd edition of the Montreal AI Ethics Institute's The State of AI Ethics captures the most relevant developments in AI Ethics since October 2020. It aims to help anyone, from machine learning experts to human rights activists and policymakers, quickly digest and understand the field's ever-changing developments. Through research and article summaries, as well as expert commentary, this report distills the research and reporting surrounding various domains related to the ethics of AI, including: algorithmic injustice, discrimination, ethical AI, labor impacts, misinformation, privacy, risk and security, social media, and more. In addition, The State of AI Ethics includes exclusive content written by world-class AI Ethics experts from universities, research institutes, consulting firms, and governments. Unique to this report is "The Abuse and Misogynoir Playbook," written by Dr. Katlyn Tuner (Research Scientist, Space Enabled Research Group, MIT), Dr. Danielle Wood (Assistant Professor, Program in Media Arts and Sciences; Assistant Professor, Aeronautics and Astronautics; Lead, Space Enabled Research Group, MIT) and Dr. Catherine D'Ignazio (Assistant Professor, Urban Science and Planning; Director, Data + Feminism Lab, MIT). The piece (and accompanying infographic), is a deep-dive into the historical and systematic silencing, erasure, and revision of Black women's contributions to knowledge and scholarship in the United Stations, and globally. Exposing and countering this Playbook has become increasingly important following the firing of AI Ethics expert Dr. Timnit Gebru (and several of her supporters) at Google. This report should be used not only as a point of reference and insight on the latest thinking in the field of AI Ethics, but should also be used as a tool for introspection as we aim to foster a more nuanced conversation regarding the impacts of AI on the world.