Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaokang Pan

Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations

Jul 07, 2024

Xiaokang Pan, Xingyu Li, Jin Liu, Tao Sun, Kai Sun, Lixing Chen, Zhe Qu

Figure 1 for Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations

Figure 2 for Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations

Figure 3 for Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations

Abstract:STOchastic Recursive Momentum (STORM)-based algorithms have been widely developed to solve one to $K$-level ($K \geq 3$) stochastic optimization problems. Specifically, they use estimators to mitigate the biased gradient issue and achieve near-optimal convergence results. However, there is relatively little work on understanding their generalization performance, particularly evident during the transition from one to $K$-level optimization contexts. This paper provides a comprehensive generalization analysis of three representative STORM-based algorithms: STORM, COVER, and SVMR, for one, two, and $K$-level stochastic optimizations under both convex and strongly convex settings based on algorithmic stability. Firstly, we define stability for $K$-level optimizations and link it to generalization. Then, we detail the stability results for three prominent STORM-based algorithms. Finally, we derive their excess risk bounds by balancing stability results with optimization errors. Our theoretical results provide strong evidence to complete STORM-based algorithms: (1) Each estimator may decrease their stability due to variance with its estimation target. (2) Every additional level might escalate the generalization error, influenced by the stability and the variance between its cumulative stochastic gradient and the true gradient. (3) Increasing the batch size for the initial computation of estimators presents a favorable trade-off, enhancing the generalization performance.

Via

Access Paper or Ask Questions

Breaking the Complexity Barrier in Compositional Minimax Optimization

Aug 18, 2023

Jin Liu, Xiaokang Pan, Junwen Duan, Hongdong Li, Youqi Li, Zhe Qu

Abstract:Compositional minimax optimization is a pivotal yet under-explored challenge across machine learning, including distributionally robust training and policy evaluation for reinforcement learning. Current techniques exhibit suboptimal complexity or rely heavily on large batch sizes. This paper proposes Nested STOchastic Recursive Momentum (NSTORM), attaining the optimal sample complexity of $O(\kappa^3/\epsilon^3)$ for finding an $\epsilon$-accurate solution. However, NSTORM requires low learning rates, potentially limiting applicability. Thus we introduce ADAptive NSTORM (ADA-NSTORM) with adaptive learning rates, proving it achieves the same sample complexity while experiments demonstrate greater effectiveness. Our methods match lower bounds for minimax optimization without large batch requirements, validated through extensive experiments. This work significantly advances compositional minimax optimization, a crucial capability for distributional robustness and policy evaluation

Via

Access Paper or Ask Questions