Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

Sep 29, 2022

Zijian Liu, Ta Duy Nguyen, Thien Hang Nguyen, Alina Ene, Huy L. Nguyen

Figure 1 for META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

Figure 2 for META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

Figure 3 for META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

Figure 4 for META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

Share this with someone who'll enjoy it:

Abstract:We study the application of variance reduction (VR) techniques to general non-convex stochastic optimization problems. In this setting, the recent work STORM [Cutkosky-Orabona '19] overcomes the drawback of having to compute gradients of "mega-batches" that earlier VR methods rely on. There, STORM utilizes recursive momentum to achieve the VR effect and is then later made fully adaptive in STORM+ [Levy et al., '21], where full-adaptivity removes the requirement for obtaining certain problem-specific parameters such as the smoothness of the objective and bounds on the variance and norm of the stochastic gradients in order to set the step size. However, STORM+ crucially relies on the assumption that the function values are bounded, excluding a large class of useful functions. In this work, we propose META-STORM, a generalized framework of STORM+ that removes this bounded function values assumption while still attaining the optimal convergence rate for non-convex optimization. META-STORM not only maintains full-adaptivity, removing the need to obtain problem specific parameters, but also improves the convergence rate's dependency on the problem parameters. Furthermore, META-STORM can utilize a large range of parameter settings that subsumes previous methods allowing for more flexibility in a wider range of settings. Finally, we demonstrate the effectiveness of META-STORM through experiments across common deep learning tasks. Our algorithm improves upon the previous work STORM+ and is competitive with widely used algorithms after the addition of per-coordinate update and exponential moving average heuristics.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

Paper and Code