Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning Invariances for Interpretability using Supervised VAE

Jul 15, 2020

An-phi Nguyen, María Rodríguez Martínez

Figure 1 for Learning Invariances for Interpretability using Supervised VAE

Figure 2 for Learning Invariances for Interpretability using Supervised VAE

Figure 3 for Learning Invariances for Interpretability using Supervised VAE

Figure 4 for Learning Invariances for Interpretability using Supervised VAE

Share this with someone who'll enjoy it:

Abstract:We propose to learn model invariances as a means of interpreting a model. This is motivated by a reverse engineering principle. If we understand a problem, we may introduce inductive biases in our model in the form of invariances. Conversely, when interpreting a complex supervised model, we can study its invariances to understand how that model solves a problem. To this end we propose a supervised form of variational auto-encoders (VAEs). Crucially, only a subset of the dimensions in the latent space contributes to the supervised task, allowing the remaining dimensions to act as nuisance parameters. By sampling solely the nuisance dimensions, we are able to generate samples that have undergone transformations that leave the classification unchanged, revealing the invariances of the model. Our experimental results show the capability of our proposed model both in terms of classification, and generation of invariantly transformed samples. Finally we show how combining our model with feature attribution methods it is possible to reach a more fine-grained understanding about the decision process of the model.

View paper on

Share this with someone who'll enjoy it:

Title:Learning Invariances for Interpretability using Supervised VAE

Paper and Code