Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Deep learning, stochastic gradient descent and diffusion maps

Apr 06, 2022

Carmina Fjellström, Kaj Nyström

Figure 1 for Deep learning, stochastic gradient descent and diffusion maps

Figure 2 for Deep learning, stochastic gradient descent and diffusion maps

Figure 3 for Deep learning, stochastic gradient descent and diffusion maps

Figure 4 for Deep learning, stochastic gradient descent and diffusion maps

Share this with someone who'll enjoy it:

Abstract:Stochastic gradient descent (SGD) is widely used in deep learning due to its computational efficiency but a complete understanding of why SGD performs so well remains a major challenge. It has been observed empirically that most eigenvalues of the Hessian of the loss functions on the loss landscape of over-parametrized deep networks are close to zero, while only a small number of eigenvalues are large. Zero eigenvalues indicate zero diffusion along the corresponding directions. This indicates that the process of minima selection mainly happens in the relatively low-dimensional subspace corresponding to top eigenvalues of the Hessian. Although the parameter space is very high-dimensional, these findings seems to indicate that the SGD dynamics may mainly live on a low-dimensional manifold. In this paper we pursue a truly data driven approach to the problem of getting a potentially deeper understanding of the high-dimensional parameter surface, and in particular of the landscape traced out by SGD, by analyzing the data generated through SGD, or any other optimizer for that matter, in order to possibly discovery (local) low-dimensional representations of the optimization landscape. As our vehicle for the exploration we use diffusion maps introduced by R. Coifman and coauthors.

View paper on

Share this with someone who'll enjoy it:

Title:Deep learning, stochastic gradient descent and diffusion maps

Paper and Code