Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hanna Tseran

Mildly Overparameterized ReLU Networks Have a Favorable Loss Landscape

May 31, 2023

Kedar Karhadkar, Michael Murray, Hanna Tseran, Guido Montúfar

Abstract:We study the loss landscape of two-layer mildly overparameterized ReLU neural networks on a generic finite input dataset for the squared error loss. Our approach involves bounding the dimension of the sets of local and global minima using the rank of the Jacobian of the parameterization map. Using results on random binary matrices, we show most activation patterns correspond to parameter regions with no bad differentiable local minima. Furthermore, for one-dimensional input data, we show most activation regions realizable by the network contain a high dimensional set of global minima and no bad local minima. We experimentally confirm these results by finding a phase transition from most regions having full rank to many regions having deficient rank depending on the amount of overparameterization.

* 27 pages

Via

Access Paper or Ask Questions

Expected Gradients of Maxout Networks and Consequences to Parameter Initialization

Jan 17, 2023

Hanna Tseran, Guido Montúfar

Abstract:We study the gradients of a maxout network with respect to inputs and parameters and obtain bounds for the moments depending on the architecture and the parameter distribution. We observe that the distribution of the input-output Jacobian depends on the input, which complicates a stable parameter initialization. Based on the moments of the gradients, we formulate parameter initialization strategies that avoid vanishing and exploding gradients in wide networks. Experiments with deep fully-connected and convolutional networks show that this strategy improves SGD and Adam training of deep maxout networks. In addition, we obtain refined bounds on the expected number of linear regions, results on the expected curve length distortion, and results on the NTK.

* 37 pages, 8 figures

Via

Access Paper or Ask Questions

On the Expected Complexity of Maxout Networks

Jul 01, 2021

Hanna Tseran, Guido Montúfar

Figure 1 for On the Expected Complexity of Maxout Networks

Figure 2 for On the Expected Complexity of Maxout Networks

Figure 3 for On the Expected Complexity of Maxout Networks

Figure 4 for On the Expected Complexity of Maxout Networks

Abstract:Learning with neural networks relies on the complexity of the representable functions, but more importantly, the particular assignment of typical parameters to functions of different complexity. Taking the number of activation regions as a complexity measure, recent works have shown that the practical complexity of deep ReLU networks is often far from the theoretical maximum. In this work we show that this phenomenon also occurs in networks with maxout (multi-argument) activation functions and when considering the decision boundaries in classification tasks. We also show that the parameter space has a multitude of full-dimensional regions with widely different complexity, and obtain nontrivial lower bounds on the expected complexity. Finally, we investigate different parameter initialization procedures and show that they can increase the speed of convergence in training.

* 41 pages, 18 figures

Via

Access Paper or Ask Questions