Abstract:This paper focuses on the online saddle point problem, which involves a sequence of two-player time-varying convex-concave games. Considering the nonstationarity of the environment, we adopt the duality gap and the dynamic Nash equilibrium regret as performance metrics for algorithm design. We present three variants of the proximal point method: the Online Proximal Point Method~(OPPM), the Optimistic OPPM~(OptOPPM), and the OptOPPM with multiple predictors. Each algorithm guarantees upper bounds for both the duality gap and dynamic Nash equilibrium regret, achieving near-optimality when measured against the duality gap. Specifically, in certain benign environments, such as sequences of stationary payoff functions, these algorithms maintain a nearly constant metric bound. Experimental results further validate the effectiveness of these algorithms. Lastly, this paper discusses potential reliability concerns associated with using dynamic Nash equilibrium regret as a performance metric.
Abstract:Centered around solving the Online Saddle Point problem, this paper introduces the Online Convex-Concave Optimization (OCCO) framework, which involves a sequence of two-player time-varying convex-concave games. We propose the generalized duality gap (Dual-Gap) as the performance metric and establish the parallel relationship between OCCO with Dual-Gap and Online Convex Optimization (OCO) with regret. To demonstrate the natural extension of OCCO from OCO, we develop two algorithms, the implicit online mirror descent-ascent and its optimistic variant. Analysis reveals that their duality gaps share similar expression forms with the corresponding dynamic regrets arising from implicit updates in OCO. Empirical results further substantiate the effectiveness of our algorithms. Simultaneously, we unveil that the dynamic Nash equilibrium regret, which was initially introduced in a recent paper, has inherent defects.
Abstract:In this paper, we study the optimistic online convex optimization problem in dynamic environments. Existing works have shown that Ader enjoys an $O\left(\sqrt{\left(1+P_T\right)T}\right)$ dynamic regret upper bound, where $T$ is the number of rounds, and $P_T$ is the path length of the reference strategy sequence. However, Ader is not environment-adaptive. Based on the fact that optimism provides a framework for implementing environment-adaptive, we replace Greedy Projection (GP) and Normalized Exponentiated Subgradient (NES) in Ader with Optimistic-GP and Optimistic-NES respectively, and name the corresponding algorithm ONES-OGP. We also extend the doubling trick to the adaptive trick, and introduce three characteristic terms naturally arise from optimism, namely $M_T$, $\widetilde{M}_T$ and $V_T+1_{L^2\rho\left(\rho+2 P_T\right)\leqslant\varrho^2 V_T}D_T$, to replace the dependence of the dynamic regret upper bound on $T$. We elaborate ONES-OGP with adaptive trick and its subgradient variation version, all of which are environment-adaptive.
Abstract:We present a unified analysis method that relies on the generalized cosine rule and $\phi$-convex for online optimization in normed vector space using dynamic regret as the performance metric. In combing the update rules, we start with strategy $S$ (a two-parameter variant strategy covering Optimistic-FTRL with surrogate linearized losses), and obtain $S$-I (type-I relaxation variant form of $S$) and $S$-II (type-II relaxation variant form of $S$, which is Optimistic-MD) by relaxation. Regret bounds for $S$-I and $S$-II are the tightest possible. As instantiations, regret bounds of normalized exponentiated subgradient and greedy/lazy projection are better than the currently known optimal results. By replacing losses of online game with monotone operators, and extending the definition of regret, namely regret$^n$, we extend online convex optimization to online monotone optimization, which expands the application scope of $S$-I and $S$-II.