Abstract:We discuss a vulnerability involving a category of attribution methods used to provide explanations for the outputs of convolutional neural networks working as classifiers. It is known that this type of networks are vulnerable to adversarial attacks, in which imperceptible perturbations of the input may alter the outputs of the model. In contrast, here we focus on effects that small modifications in the model may cause on the attribution method without altering the model outputs.
Abstract:Gradient based attribution methods for neural networks working as classifiers use gradients of network scores. Here we discuss the practical differences between using gradients of pre-softmax scores versus post-softmax scores, and their respective advantages and disadvantages.
Abstract:Neural networks are becoming increasingly better at tasks that involve classifying and recognizing images. At the same time techniques intended to explain the network output have been proposed. One such technique is the Gradient-based Class Activation Map (Grad-CAM), which is able to locate features of an input image at various levels of a convolutional neural network (CNN), but is sensitive to the vanishing gradients problem. There are techniques such as Integrated Gradients (IG), that are not affected by that problem, but its use is limited to the input layer of a network. Here we introduce a new technique to produce visual explanations for the predictions of a CNN. Like Grad-CAM, our method can be applied to any layer of the network, and like Integrated Gradients it is not affected by the problem of vanishing gradients. For efficiency, gradient integration is performed numerically at the layer level using a Riemann-Stieltjes sum approximation. Compared to Grad-CAM, heatmaps produced by our algorithm are better focused in the areas of interest, and their numerical computation is more stable. Our code is available at https://github.com/mlerma54/RSIGradCAM
Abstract:The Grad-CAM algorithm provides a way to identify what parts of an image contribute most to the output of a classifier deep network. The algorithm is simple and widely used for localization of objects in an image, although some researchers have point out its limitations, and proposed various alternatives. One of them is Grad-CAM++, that according to its authors can provide better visual explanations for network predictions, and does a better job at locating objects even for occurrences of multiple object instances in a single image. Here we show that Grad-CAM++ is practically equivalent to a very simple variation of Grad-CAM in which gradients are replaced with positive gradients.
Abstract:We discuss a way to find a well behaved baseline for attribution methods that work by feeding a neural network with a sequence of interpolated inputs between two given inputs. Then, we test it with our novel Riemann-Stieltjes Integrated Gradient-weighted Class Activation Mapping (RSI-Grad-CAM) attribution method.
Abstract:We provide rigorous proofs that the Integrated Gradients (IG) attribution method for deep networks satisfies completeness and symmetry-preserving properties. We also study the uniqueness of IG as a path method preserving symmetry.