Abstract:The increasing use of deep neural networks (DNNs) has motivated a parallel endeavor: the design of adversaries that profit from successful misclassifications. However, not all adversarial examples are crafted for malicious purposes. For example, real world systems often contain physical, temporal, and sampling variability across instrumentation. Adversarial examples in the wild may inadvertently prove deleterious for accurate predictive modeling. Conversely, naturally occurring covariance of image features may serve didactic purposes. Here, we studied the stability of deep learning representations for neuroimaging classification across didactic and adversarial conditions characteristic of MRI acquisition variability. We show that representational similarity and performance vary according to the frequency of adversarial examples in the input space.