Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ulrich Terstiege

Convergence of gradient descent for learning linear neural networks

Aug 04, 2021

Gabin Maxime Nguegnang, Holger Rauhut, Ulrich Terstiege

Figure 1 for Convergence of gradient descent for learning linear neural networks

Figure 2 for Convergence of gradient descent for learning linear neural networks

Abstract:We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the step sizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.

Via

Access Paper or Ask Questions

Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

Oct 12, 2019

Bubacarr Bah, Holger Rauhut, Ulrich Terstiege, Michael Westdickenberg

Figure 1 for Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

Figure 2 for Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

Figure 3 for Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

Figure 4 for Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

Abstract:We study the convergence of gradient flows related to learning deep linear neural networks from data (i.e., the activation function is the identity map). In this case, the composition of the network layers amounts to simply multiplying the weight matrices of all layers together, resulting in an overparameterized problem. We show that the gradient flow with respect to these factors can be re-interpreted as a Riemannian gradient flow on the manifold of rank-$r$ matrices endowed with a suitable Riemannian metric. We show that the flow always converges to a critical point of the underlying functional. Moreover, in the special case of an autoencoder, we show that the flow converges to a global minimum for almost all initializations.

* 21 pages

Via

Access Paper or Ask Questions