Abstract:The failure of deep neural networks to generalize to out-of-distribution data is a well-known problem and raises concerns about the deployment of trained networks in safety-critical domains such as healthcare, finance and autonomous vehicles. We study a particular kind of distribution shift $\unicode{x2013}$ shortcuts or spurious correlations in the training data. Shortcut learning is often only exposed when models are evaluated on real-world data that does not contain the same spurious correlations, posing a serious dilemma for AI practitioners to properly assess the effectiveness of a trained model for real-world applications. In this work, we propose to use the mutual information (MI) between the learned representation and the input as a metric to find where in training, the network latches onto shortcuts. Experiments demonstrate that MI can be used as a domain-agnostic metric for monitoring shortcut learning.
Abstract:Although machine learning (ML) has been successful in automating various software engineering needs, software testing still remains a highly challenging topic. In this paper, we aim to improve the generative testing of software by directly augmenting the random number generator (RNG) with a deep reinforcement learning (RL) agent using an efficient, automatically extractable state representation of the software under test. Using the Cosmos SDK as the testbed, we show that the proposed DeepRNG framework provides a statistically significant improvement to the testing of the highly complex software library with over 350,000 lines of code. The source code of the DeepRNG framework is publicly available online.
Abstract:Recent advancements in self-supervised learning have reduced the gap between supervised and unsupervised representation learning. However, most self-supervised and deep clustering techniques rely heavily on data augmentation, rendering them ineffective for many learning tasks where insufficient domain knowledge exists for performing augmentation. We propose a new self-distillation based algorithm for domain-agnostic clustering. Our method builds upon the existing deep clustering frameworks and requires no separate student model. The proposed method outperforms existing domain agnostic (augmentation-free) algorithms on CIFAR-10. We empirically demonstrate that knowledge distillation can improve unsupervised representation learning by extracting richer `dark knowledge' from the model than using predicted labels alone. Preliminary experiments also suggest that self-distillation improves the convergence of DeepCluster-v2.
Abstract:Generative adversarial networks (GANs) are capable of generating strikingly realistic samples but state-of-the-art GANs can be extremely computationally expensive to train. In this paper, we propose the fused propagation (FusedProp) algorithm which can be used to efficiently train the discriminator and the generator of common GANs simultaneously using only one forward and one backward propagation. We show that FusedProp achieves 1.49 times the training speed compared to the conventional training of GANs, although further studies are required to improve its stability. By reporting our preliminary results and open-sourcing our implementation, we hope to accelerate future research on the training of GANs.
Abstract:We present a novel neural network algorithm, the Tensor Switching (TS) network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to tensor-valued hidden units. The TS network copies its entire input vector to different locations in an expanded representation, with the location determined by its hidden unit activity. In this way, even a simple linear readout from the TS representation can implement a highly expressive deep-network-like function. The TS network hence avoids the vanishing gradient problem by construction, at the cost of larger representation size. We develop several methods to train the TS network, including equivalent kernels for infinitely wide and deep TS networks, a one-pass linear learning algorithm, and two backpropagation-inspired representation learning algorithms. Our experimental results demonstrate that the TS network is indeed more expressive and consistently learns faster than standard ReLU networks.
Abstract:A central challenge in sensory neuroscience is describing how the activity of populations of neurons can represent useful features of the external environment. However, while neurophysiologists have long been able to record the responses of neurons in awake, behaving animals, it is another matter entirely to say what a given neuron does. A key problem is that in many sensory domains, the space of all possible stimuli that one might encounter is effectively infinite; in vision, for instance, natural scenes are combinatorially complex, and an organism will only encounter a tiny fraction of possible stimuli. As a result, even describing the response properties of sensory neurons is difficult, and investigations of neuronal functions are almost always critically limited by the number of stimuli that can be considered. In this paper, we propose a closed-loop, optimization-based experimental framework for characterizing the response properties of sensory neurons, building on past efforts in closed-loop experimental methods, and leveraging recent advances in artificial neural networks to serve as as a proving ground for our techniques. Specifically, using deep convolutional neural networks, we asked whether modern black-box optimization techniques can be used to interrogate the "tuning landscape" of an artificial neuron in a deep, nonlinear system, without imposing significant constraints on the space of stimuli under consideration. We introduce a series of measures to quantify the tuning landscapes, and show how these relate to the performances of the networks in an object recognition task. To the extent that deep convolutional neural networks increasingly serve as de facto working hypotheses for biological vision, we argue that developing a unified approach for studying both artificial and biological systems holds great potential to advance both fields together.