Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Explaining Classifiers with Causal Concept Effect (CaCE)

Jul 16, 2019

Yash Goyal, Uri Shalit, Been Kim

Figure 1 for Explaining Classifiers with Causal Concept Effect (CaCE)

Figure 2 for Explaining Classifiers with Causal Concept Effect (CaCE)

Figure 3 for Explaining Classifiers with Causal Concept Effect (CaCE)

Figure 4 for Explaining Classifiers with Causal Concept Effect (CaCE)

Share this with someone who'll enjoy it:

Abstract:How can we understand classification decisions made by deep neural nets? We propose answering this question by using ideas from causal inference. We define the ``Causal Concept Effect'' (CaCE) as the causal effect that the presence or absence of a concept has on the prediction of a given deep neural net. We then use this measure as a mean to understand what drives the network's prediction and what does not. Yet many existing interpretability methods rely solely on correlations, resulting in potentially misleading explanations. We show how CaCE can avoid such mistakes. In high-risk domains such as medicine, knowing the root cause of the prediction is crucial. If we knew that the network's prediction was caused by arbitrary concepts such as the lighting conditions in an X-ray room instead of medically meaningful concept, this would prevent us from disastrous deployment of such models. Estimating CaCE is difficult in situations where we cannot easily simulate the do-operator. As a simple solution, we propose learning a generative model, specifically a Variational AutoEncoder (VAE) on image pixels or image embeddings extracted from the classifier to measure VAE-CaCE. We show that VAE-CaCE is able to correctly estimate the true causal effect as compared to other baselines in controlled settings with synthetic and semi-natural high dimensional images.

View paper on

Share this with someone who'll enjoy it:

Title:Explaining Classifiers with Causal Concept Effect (CaCE)

Paper and Code