Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel A. Abolafia

Towards Modular Algorithm Induction

Feb 27, 2020

Daniel A. Abolafia, Rishabh Singh, Manzil Zaheer, Charles Sutton

Figure 1 for Towards Modular Algorithm Induction

Figure 2 for Towards Modular Algorithm Induction

Figure 3 for Towards Modular Algorithm Induction

Figure 4 for Towards Modular Algorithm Induction

Abstract:We present a modular neural network architecture Main that learns algorithms given a set of input-output examples. Main consists of a neural controller that interacts with a variable-length input tape and learns to compose modules together with their corresponding argument choices. Unlike previous approaches, Main uses a general domain-agnostic mechanism for selection of modules and their arguments. It uses a general input tape layout together with a parallel history tape to indicate most recently used locations. Finally, it uses a memoryless controller with a length-invariant self-attention based input tape encoding to allow for random access to tape locations. The Main architecture is trained end-to-end using reinforcement learning from a set of input-output examples. We evaluate Main on five algorithmic tasks and show that it can learn policies that generalizes perfectly to inputs of much longer lengths than the ones used for training.

* 10 pages, 4 figures, 2 tables

Via

Access Paper or Ask Questions

Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes

Oct 11, 2018

Roman Novak, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

Figure 1 for Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes

Figure 2 for Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes

Figure 3 for Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes

Figure 4 for Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes

Abstract:There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables, for instance, test set predictions that would have resulted from a fully Bayesian, infinitely wide trained FCN to be computed without ever instantiating the FCN, but by instead evaluating the corresponding GP. In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible. Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs with and without weight sharing are identical. As a consequence, translation equivariance in finite-channel CNNs trained with stochastic gradient descent (SGD) has no corresponding property in the Bayesian treatment of the infinite channel limit - a qualitative difference between the two regimes that is not present in the FCN case. We confirm experimentally, that while in some scenarios the performance of SGD-trained finite CNNs approaches that of the corresponding GPs as the channel count increases, with careful tuning SGD-trained CNNs can significantly outperform their corresponding GPs, suggesting advantages from SGD training compared to fully Bayesian parameter estimation.

* 26 pages, 7 figures

Via

Access Paper or Ask Questions

Sensitivity and Generalization in Neural Networks: an Empirical Study

Jun 18, 2018

Roman Novak, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

Figure 1 for Sensitivity and Generalization in Neural Networks: an Empirical Study

Figure 2 for Sensitivity and Generalization in Neural Networks: an Empirical Study

Figure 3 for Sensitivity and Generalization in Neural Networks: an Empirical Study

Figure 4 for Sensitivity and Generalization in Neural Networks: an Empirical Study

Abstract:In practice it is often found that large over-parameterized neural networks generalize better than their smaller counterparts, an observation that appears to conflict with classical notions of function complexity, which typically favor smaller models. In this work, we investigate this tension between complexity and generalization through an extensive empirical exploration of two natural metrics of complexity related to sensitivity to input perturbations. Our experiments survey thousands of models with various fully-connected architectures, optimizers, and other hyper-parameters, as well as four different image classification datasets. We find that trained neural networks are more robust to input perturbations in the vicinity of the training data manifold, as measured by the norm of the input-output Jacobian of the network, and that it correlates well with generalization. We further establish that factors associated with poor generalization $-$ such as full-batch training or using random labels $-$ correspond to lower robustness, while factors associated with good generalization $-$ such as data augmentation and ReLU non-linearities $-$ give rise to more robust functions. Finally, we demonstrate how the input-output Jacobian norm can be predictive of generalization at the level of individual test points.

* Published as a conference paper at ICLR 2018

Via

Access Paper or Ask Questions

Neural Program Synthesis with Priority Queue Training

Mar 23, 2018

Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le

Figure 1 for Neural Program Synthesis with Priority Queue Training

Figure 2 for Neural Program Synthesis with Priority Queue Training

Figure 3 for Neural Program Synthesis with Priority Queue Training

Figure 4 for Neural Program Synthesis with Priority Queue Training

Abstract:We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We employ an iterative optimization scheme, where we train an RNN on a dataset of K best programs from a priority queue of the generated programs so far. Then, we synthesize new programs and add them to the priority queue by sampling from the RNN. We benchmark our algorithm, called priority queue training (or PQT), against genetic algorithm and reinforcement learning baselines on a simple but expressive Turing complete programming language called BF. Our experimental results show that our simple PQT algorithm significantly outperforms the baselines. By adding a program length penalty to the reward function, we are able to synthesize short, human readable programs.

Via

Access Paper or Ask Questions