Abstract:This paper aims at developing novel shuffling gradient-based methods for tackling two classes of minimax problems: nonconvex-linear and nonconvex-strongly concave settings. The first algorithm addresses the nonconvex-linear minimax model and achieves the state-of-the-art oracle complexity typically observed in nonconvex optimization. It also employs a new shuffling estimator for the "hyper-gradient", departing from standard shuffling techniques in optimization. The second method consists of two variants: semi-shuffling and full-shuffling schemes. These variants tackle the nonconvex-strongly concave minimax setting. We establish their oracle complexity bounds under standard assumptions, which, to our best knowledge, are the best-known for this specific setting. Numerical examples demonstrate the performance of our algorithms and compare them with two other methods. Our results show that the new methods achieve comparable performance with SGD, supporting the potential of incorporating shuffling strategies into minimax algorithms.
Abstract:This paper presents a comprehensive analysis of the well-known extragradient (EG) method for solving both equations and inclusions. First, we unify and generalize EG for [non]linear equations to a wider class of algorithms, encompassing various existing schemes and potentially new variants. Next, we analyze both sublinear ``best-iterate'' and ``last-iterate'' convergence rates for the entire class of algorithms, and derive new convergence results for two well-known instances. Second, we extend our EG framework above to ``monotone'' inclusions, introducing a new class of algorithms and its corresponding convergence results. Third, we also unify and generalize Tseng's forward-backward-forward splitting (FBFS) method to a broader class of algorithms to solve [non]linear inclusions when a weak-Minty solution exists, and establish its ``best-iterate'' convergence rate. Fourth, to complete our picture, we also investigate sublinear rates of two other common variants of EG using our EG analysis framework developed here: the reflected forward-backward splitting and the golden ratio methods. Finally, we conduct an extensive numerical experiment to validate our theoretical findings. Our results demonstrate that several new variants of our proposed algorithms outperform existing schemes in the majority of examples.
Abstract:We propose a novel class of Nesterov's stochastic accelerated forward-reflected-based methods with variance reduction to solve root-finding problems under $\frac{1}{L}$-co-coerciveness. Our algorithm is single-loop and leverages a new family of unbiased variance-reduced estimators specifically designed for root-finding problems. It achieves both $\mathcal{O}(L^2/k^2)$ and $o(1/k^2)$-last-iterate convergence rates in terms of expected operator squared norm, where $k$ denotes the iteration counter. We instantiate our framework for two prominent estimators: SVRG and SAGA. By an appropriate choice of parameters, both variants attain an oracle complexity of $\mathcal{O}( n + Ln^{2/3}\epsilon^{-1})$ to reach an $\epsilon$-solution, where $n$ represents the number of summands in the finite-sum operator. Furthermore, under $\mu$-strong quasi-monotonicity, our method achieves a linear convergence rate and an oracle complexity of $\mathcal{O}(n+ \kappa n^{2/3}\log(\epsilon^{-1}))$, where $\kappa := \frac{L}{\mu}$. We extend our approach to solve a class of finite-sum monotone inclusions, demonstrating that our schemes retain the same theoretical guarantees as in the equation setting. Finally, numerical experiments validate our algorithms and demonstrate their promising performance compared to state-of-the-art methods.
Abstract:The Stochastic Gradient Descent method (SGD) and its stochastic variants have become methods of choice for solving finite-sum optimization problems arising from machine learning and data science thanks to their ability to handle large-scale applications and big datasets. In the last decades, researchers have made substantial effort to study the theoretical performance of SGD and its shuffling variants. However, only limited work has investigated its shuffling momentum variants, including shuffling heavy-ball momentum schemes for non-convex problems and Nesterov's momentum for convex settings. In this work, we extend the analysis of the shuffling momentum gradient method developed in [Tran et al (2021)] to both finite-sum convex and strongly convex optimization problems. We provide the first analysis of shuffling momentum-based methods for the strongly convex setting, attaining a convergence rate of $O(1/nT^2)$, where $n$ is the number of samples and $T$ is the number of training epochs. Our analysis is a state-of-the-art, matching the best rates of existing shuffling stochastic gradient algorithms in the literature.
Abstract:The extragradient (EG), introduced by G. M. Korpelevich in 1976, is a well-known method to approximate solutions of saddle-point problems and their extensions such as variational inequalities and monotone inclusions. Over the years, numerous variants of EG have been proposed and studied in the literature. Recently, these methods have gained popularity due to new applications in machine learning and robust optimization. In this work, we survey the latest developments in the EG method and its variants for approximating solutions of nonlinear equations and inclusions, with a focus on the monotonicity and co-hypomonotonicity settings. We provide a unified convergence analysis for different classes of algorithms, with an emphasis on sublinear best-iterate and last-iterate convergence rates. We also discuss recent accelerated variants of EG based on both Halpern fixed-point iteration and Nesterov's accelerated techniques. Our approach uses simple arguments and basic mathematical tools to make the proofs as elementary as possible, while maintaining generality to cover a broad range of problems.
Abstract:In this paper, we develop two ``Nesterov's accelerated'' variants of the well-known extragradient method to approximate a solution of a co-hypomonotone inclusion constituted by the sum of two operators, where one is Lipschitz continuous and the other is possibly multivalued. The first scheme can be viewed as an accelerated variant of Tseng's forward-backward-forward splitting method, while the second one is a variant of the reflected forward-backward splitting method, which requires only one evaluation of the Lipschitz operator, and one resolvent of the multivalued operator. Under a proper choice of the algorithmic parameters and appropriate conditions on the co-hypomonotone parameter, we theoretically prove that both algorithms achieve $\mathcal{O}(1/k)$ convergence rates on the norm of the residual, where $k$ is the iteration counter. Our results can be viewed as alternatives of a recent class of Halpern-type schemes for root-finding problems.
Abstract:In this paper, we develop an accelerated randomized block-coordinate algorithm to approximate a solution of a co-coercive equation. Such an equation plays a central role in optimization and related fields and covers many mathematical models as special cases, including convex optimization, convex-concave minimax, and variational inequality problems. Our algorithm relies on a recent Nesterov's accelerated interpretation of the Halpern fixed-point iteration in [48]. We establish that the new algorithm achieves $\mathcal{O}(1/k^2)$-convergence rate on $\mathbb{E}[\Vert Gx^k\Vert^2]$ through the last-iterate, where $G$ is the underlying co-coercive operator, $\mathbb{E}[\cdot]$ is the expectation, and $k$ is the iteration counter. This rate is significantly faster than $\mathcal{O}(1/k)$ rates in standard forward or gradient-based methods from the literature. We also prove $o(1/k^2)$ rates on both $\mathbb{E}[\Vert Gx^k\Vert^2]$ and $\mathbb{E}[\Vert x^{k+1} - x^{k}\Vert^2]$. Next, we apply our method to derive two accelerated randomized block coordinate variants of the forward-backward splitting and Douglas-Rachford splitting schemes, respectively for solving a monotone inclusion involving the sum of two operators. As a byproduct, these variants also have faster convergence rates than their non-accelerated counterparts. Finally, we apply our scheme to a finite-sum monotone inclusion that has various applications in machine learning and statistical learning, including federated learning. As a result, we obtain a novel federated learning-type algorithm with fast and provable convergence rates.
Abstract:In this book chapter, we briefly describe the main components that constitute the gradient descent method and its accelerated and stochastic variants. We aim at explaining these components from a mathematical point of view, including theoretical and practical aspects, but at an elementary level. We will focus on basic variants of the gradient descent method and then extend our view to recent variants, especially variance-reduced stochastic gradient schemes (SGD). Our approach relies on revealing the structures presented inside the problem and the assumptions imposed on the objective function. Our convergence analysis unifies several known results and relies on a general, but elementary recursive expression. We have illustrated this analysis on several common schemes.
Abstract:In this paper, we develop a new type of accelerated algorithms to solve some classes of maximally monotone equations as well as monotone inclusions. Instead of using Nesterov's accelerating approach, our methods rely on a so-called Halpern-type fixed-point iteration in [32], and recently exploited by a number of researchers, including [24, 70]. Firstly, we derive a new variant of the anchored extra-gradient scheme in [70] based on Popov's past extra-gradient method to solve a maximally monotone equation $G(x) = 0$. We show that our method achieves the same $\mathcal{O}(1/k)$ convergence rate (up to a constant factor) as in the anchored extra-gradient algorithm on the operator norm $\Vert G(x_k)\Vert$, , but requires only one evaluation of $G$ at each iteration, where $k$ is the iteration counter. Next, we develop two splitting algorithms to approximate a zero point of the sum of two maximally monotone operators. The first algorithm originates from the anchored extra-gradient method combining with a splitting technique, while the second one is its Popov's variant which can reduce the per-iteration complexity. Both algorithms appear to be new and can be viewed as accelerated variants of the Douglas-Rachford (DR) splitting method. They both achieve $\mathcal{O}(1/k)$ rates on the norm $\Vert G_{\gamma}(x_k)\Vert$ of the forward-backward residual operator $G_{\gamma}(\cdot)$ associated with the problem. We also propose a new accelerated Douglas-Rachford splitting scheme for solving this problem which achieves $\mathcal{O}(1/k)$ convergence rate on $\Vert G_{\gamma}(x_k)\Vert$ under only maximally monotone assumptions. Finally, we specify our first algorithm to solve convex-concave minimax problems and apply our accelerated DR scheme to derive a new variant of the alternating direction method of multipliers (ADMM).
Abstract:In this paper, we develop two new algorithms, called, \textbf{FedDR} and \textbf{asyncFedDR}, for solving a fundamental nonconvex optimization problem in federated learning. Our algorithms rely on a novel combination between a nonconvex Douglas-Rachford splitting method, randomized block-coordinate strategies, and asynchronous implementation. Unlike recent methods in the literature, e.g., FedSplit and FedPD, our algorithms update only a subset of users at each communication round, and possibly in an asynchronous mode, making them more practical. These new algorithms also achieve communication efficiency and more importantly can handle statistical and system heterogeneity, which are the two main challenges in federated learning. Our convergence analysis shows that the new algorithms match the communication complexity lower bound up to a constant factor under standard assumptions. Our numerical experiments illustrate the advantages of the proposed methods compared to existing ones using both synthetic and real datasets.