Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On the Convergence of Weighted AdaGrad with Momentum for Training Deep Neural Networks

Sep 28, 2018

Fangyu Zou, Li Shen

Figure 1 for On the Convergence of Weighted AdaGrad with Momentum for Training Deep Neural Networks

Figure 2 for On the Convergence of Weighted AdaGrad with Momentum for Training Deep Neural Networks

Figure 3 for On the Convergence of Weighted AdaGrad with Momentum for Training Deep Neural Networks

Figure 4 for On the Convergence of Weighted AdaGrad with Momentum for Training Deep Neural Networks

Share this with someone who'll enjoy it:

Abstract:Adaptive stochastic gradient descent methods, such as AdaGrad, RMSProp, Adam, AMSGrad, etc., have been demonstrated efficacious in solving non-convex stochastic optimization, such as training deep neural networks. However, their convergence rates have not been touched under the non-convex stochastic circumstance except recent breakthrough results on AdaGrad, perturbed AdaGrad and AMSGrad. In this paper, we propose two new adaptive stochastic gradient methods called AdaHB and AdaNAG which integrate a novel weighted coordinate-wise AdaGrad with heavy ball momentum and Nesterov accelerated gradient momentum, respectively. The $\mathcal{O}(\frac{\log{T}}{\sqrt{T}})$ non-asymptotic convergence rates of AdaHB and AdaNAG in non-convex stochastic setting are also jointly established by leveraging a newly developed unified formulation of these two momentum mechanisms. Moreover, comparisons have been made between AdaHB, AdaNAG, Adam and RMSProp, which, to a certain extent, explains the reasons why Adam and RMSProp are divergent. In particular, when momentum term vanishes we obtain convergence rate of coordinate-wise AdaGrad in non-convex stochastic setting as a byproduct.

* We generalize AdaGrad to Weighted Adagrad. Discussion with Adam and RMSProp are provided in Section 4

View paper on

Share this with someone who'll enjoy it:

Title:On the Convergence of Weighted AdaGrad with Momentum for Training Deep Neural Networks

Paper and Code