Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Interpreting video features: a comparison of 3D convolutional networks and convolutional LSTM networks

Feb 02, 2020

Joonatan Mänttäri, Sofia Broomé, John Folkesson, Hedvig Kjellström

Figure 1 for Interpreting video features: a comparison of 3D convolutional networks and convolutional LSTM networks

Figure 2 for Interpreting video features: a comparison of 3D convolutional networks and convolutional LSTM networks

Figure 3 for Interpreting video features: a comparison of 3D convolutional networks and convolutional LSTM networks

Figure 4 for Interpreting video features: a comparison of 3D convolutional networks and convolutional LSTM networks

Share this with someone who'll enjoy it:

Abstract:A number of techniques for interpretability have been presented for deep learning in computer vision, typically with the goal of understanding what it is that the networks have actually learned underneath a given classification decision. However, when it comes to deep video architectures, interpretability is still in its infancy and we do not yet have a clear concept of how we should decode spatiotemporal features. In this paper, we present a study comparing how 3D convolutional networks and convolutional LSTM networks learn features across temporally dependent frames. This is the first comparison of two video models that both convolve to learn spatial features but that have principally different methods of modeling time. Additionally, we extend the concept of meaningful perturbation introduced by Fong & Vedaldi (2017) to the temporal dimension to search for the most meaningful part of a sequence for a classification decision.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Interpreting video features: a comparison of 3D convolutional networks and convolutional LSTM networks

Paper and Code