Abstract:Counterfactual explanations are an increasingly popular form of post hoc explanation due to their (i) applicability across problem domains, (ii) proposed legal compliance (e.g., with GDPR), and (iii) reliance on the contrastive nature of human explanation. Although counterfactual explanations are normally used to explain individual predictive-instances, we explore a novel use case in which groups of similar instances are explained in a collective fashion using ``group counterfactuals'' (e.g., to highlight a repeating pattern of illness in a group of patients). These group counterfactuals meet a human preference for coherent, broad explanations covering multiple events/instances. A novel, group-counterfactual algorithm is proposed to generate high-coverage explanations that are faithful to the to-be-explained model. This explanation strategy is also evaluated in a large, controlled user study (N=207), using objective (i.e., accuracy) and subjective (i.e., confidence, explanation satisfaction, and trust) psychological measures. The results show that group counterfactuals elicit modest but definite improvements in people's understanding of an AI system. The implications of these findings for counterfactual methods and for XAI are discussed.
Abstract:Counterfactual explanations are increasingly used to address interpretability, recourse, and bias in AI decisions. However, we do not know how well counterfactual explanations help users to understand a systems decisions, since no large scale user studies have compared their efficacy to other sorts of explanations such as causal explanations (which have a longer track record of use in rule based and decision tree models). It is also unknown whether counterfactual explanations are equally effective for categorical as for continuous features, although current methods assume they do. Hence, in a controlled user study with 127 volunteer participants, we tested the effects of counterfactual and causal explanations on the objective accuracy of users predictions of the decisions made by a simple AI system, and participants subjective judgments of satisfaction and trust in the explanations. We discovered a dissociation between objective and subjective measures: counterfactual explanations elicit higher accuracy of predictions than no-explanation control descriptions but no higher accuracy than causal explanations, yet counterfactual explanations elicit greater satisfaction and trust than causal explanations. We also found that users understand explanations referring to categorical features more readily than those referring to continuous features. We discuss the implications of these findings for current and future counterfactual methods in XAI.