Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Jul 28, 2022

Tilman Räuker, Anson Ho, Stephen Casper, Dylan Hadfield-Menell

Figure 1 for Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Figure 2 for Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Figure 3 for Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Figure 4 for Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Share this with someone who'll enjoy it:

Abstract:The last decade of machine learning has seen drastic increases in scale and capabilities, and deep neural networks (DNNs) are increasingly being deployed across a wide range of domains. However, the inner workings of DNNs are generally difficult to understand, raising concerns about the safety of using these systems without a rigorous understanding of how they function. In this survey, we review literature on techniques for interpreting the inner components of DNNs, which we call "inner" interpretability methods. Specifically, we review methods for interpreting weights, neurons, subnetworks, and latent representations with a focus on how these techniques relate to the goal of designing safer, more trustworthy AI systems. We also highlight connections between interpretability and work in modularity, adversarial robustness, continual learning, network compression, and studying the human visual system. Finally, we discuss key challenges and argue for future work in interpretability for AI safety that focuses on diagnostics, benchmarking, and robustness.

View paper on

Share this with someone who'll enjoy it:

Title:Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Paper and Code