Abstract:This paper introduces the first graph-based framework for generating group counterfactual explanations to audit model fairness, a crucial aspect of trustworthy machine learning. Counterfactual explanations are instrumental in understanding and mitigating unfairness by revealing how inputs should change to achieve a desired outcome. Our framework, named Feasible Group Counterfactual Explanations (FGCEs), captures real-world feasibility constraints and constructs subgroups with similar counterfactuals, setting it apart from existing methods. It also addresses key trade-offs in counterfactual generation, including the balance between the number of counterfactuals, their associated costs, and the breadth of coverage achieved. To evaluate these trade-offs and assess fairness, we propose measures tailored to group counterfactual generation. Our experimental results on benchmark datasets demonstrate the effectiveness of our approach in managing feasibility constraints and trade-offs, as well as the potential of our proposed metrics in identifying and quantifying fairness issues.
Abstract:Algorithmic fairness and explainability are foundational elements for achieving responsible AI. In this paper, we focus on their interplay, a research area that is recently receiving increasing attention. To this end, we first present two comprehensive taxonomies, each representing one of the two complementary fields of study: fairness and explanations. Then, we categorize explanations for fairness into three types: (a) Explanations to enhance fairness metrics, (b) Explanations to help us understand the causes of (un)fairness, and (c) Explanations to assist us in designing methods for mitigating unfairness. Finally, based on our fairness and explanation taxonomies, we present undiscovered literature paths revealing gaps that can serve as valuable insights for future research.