Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

Oct 13, 2022

Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu

Figure 1 for Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

Figure 2 for Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

Figure 3 for Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

Figure 4 for Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

Share this with someone who'll enjoy it:

Abstract:The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that asymptotically, gradient flow produces a neural network with rank at most two. Moreover, this network is an $\ell_2$-max-margin solution (in parameter space), and has a linear decision boundary that corresponds to an approximate-max-margin linear predictor. For gradient descent, provided the random initialization variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training. We provide experiments which suggest that a small initialization scale is important for finding low-rank neural networks with gradient descent.

* 54 pages

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

Paper and Code