Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Siddharth Chandak

$O(1/k)$ Finite-Time Bound for Non-Linear Two-Time-Scale Stochastic Approximation

Apr 27, 2025

Siddharth Chandak

Abstract:Two-time-scale stochastic approximation is an algorithm with coupled iterations which has found broad applications in reinforcement learning, optimization and game control. While several prior works have obtained a mean square error bound of $O(1/k)$ for linear two-time-scale iterations, the best known bound in the non-linear contractive setting has been $O(1/k^{2/3})$. In this work, we obtain an improved bound of $O(1/k)$ for non-linear two-time-scale stochastic approximation. Our result applies to algorithms such as gradient descent-ascent and two-time-scale Lagrangian optimization. The key step in our analysis involves rewriting the original iteration in terms of an averaged noise sequence which decays sufficiently fast. Additionally, we use an induction-based approach to show that the iterates are bounded in expectation.

* Submitted to IEEE Transactions on Automatic Control

Via

Access Paper or Ask Questions

Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis

Jan 18, 2025

Siddharth Chandak

Abstract:Two-time-scale stochastic approximation is an iterative algorithm used in applications such as optimization, reinforcement learning, and control. Finite-time analysis of these algorithms has primarily focused on fixed point iterations where both time-scales have contractive mappings. In this paper, we study two-time-scale iterations, where the slower time-scale has a non-expansive mapping. For such algorithms, the slower time-scale can be considered a stochastic inexact Krasnoselskii-Mann iteration. We show that the mean square error decays at a rate $O(1/k^{1/4-\epsilon})$, where $\epsilon>0$ is arbitrarily small. We also show almost sure convergence of iterates to the set of fixed points. We show the applicability of our framework by applying our results to minimax optimization, linear stochastic approximation, and Lagrangian optimization.

* Submitted to SIAM Journal on Control and Optimization

Via

Access Paper or Ask Questions

Learning to Control Unknown Strongly Monotone Games

Jun 30, 2024

Siddharth Chandak, Ilai Bistritz, Nicholas Bambos

Figure 1 for Learning to Control Unknown Strongly Monotone Games

Figure 2 for Learning to Control Unknown Strongly Monotone Games

Figure 3 for Learning to Control Unknown Strongly Monotone Games

Abstract:Consider $N$ players each with a $d$-dimensional action set. Each of the players' utility functions includes their reward function and a linear term for each dimension, with coefficients that are controlled by the manager. We assume that the game is strongly monotone, so if each player runs gradient descent, the dynamics converge to a unique Nash equilibrium (NE). The NE is typically inefficient in terms of global performance. The resulting global performance of the system can be improved by imposing $K$-dimensional linear constraints on the NE. We therefore want the manager to pick the controlled coefficients that impose the desired constraint on the NE. However, this requires knowing the players' reward functions and their action sets. Obtaining this game structure information is infeasible in a large-scale network and violates the users' privacy. To overcome this, we propose a simple algorithm that learns to shift the NE of the game to meet the linear constraints by adjusting the controlled coefficients online. Our algorithm only requires the linear constraints violation as feedback and does not need to know the reward functions or the action sets. We prove that our algorithm, which is based on two time-scale stochastic approximation, guarantees convergence with probability 1 to the set of NE that meet target linear constraints. We then provide a mean square convergence rate of $O(t^{-1/4})$ for our algorithm. This is the first such bound for two time-scale stochastic approximation where the slower time-scale is a fixed point iteration with a non-expansive mapping. We demonstrate how our scheme can be applied to optimizing a global quadratic cost at NE and load balancing in resource allocation games. We provide simulations of our algorithm for these scenarios.

* Submitted to IEEE Transactions on Automatic Control

Via

Access Paper or Ask Questions

A Concentration Bound for TD with Function Approximation

Dec 16, 2023

Siddharth Chandak, Vivek S. Borkar

Abstract:We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.

* Submitted to Stochastic Systems

Via

Access Paper or Ask Questions

Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics

Feb 27, 2023

Siddharth Chandak, Ilai Bistritz, Nicholas Bambos

Figure 1 for Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics

Figure 2 for Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics

Abstract:Consider a decision-maker that can pick one out of $K$ actions to control an unknown system, for $T$ turns. The actions are interpreted as different configurations or policies. Holding the same action fixed, the system asymptotically converges to a unique equilibrium, as a function of this action. The dynamics of the system are unknown to the decision-maker, which can only observe a noisy reward at the end of every turn. The decision-maker wants to maximize its accumulated reward over the $T$ turns. Learning what equilibria are better results in higher rewards, but waiting for the system to converge to equilibrium costs valuable time. Existing bandit algorithms, either stochastic or adversarial, achieve linear (trivial) regret for this problem. We present a novel algorithm, termed Upper Equilibrium Concentration Bound (UECB), that knows to switch an action quickly if it is not worth it to wait until the equilibrium is reached. This is enabled by employing convergence bounds to determine how far the system is from equilibrium. We prove that UECB achieves a regret of $\mathcal{O}(\log(T)+\tau_c\log(\tau_c)+\tau_c\log\log(T))$ for this equilibrium bandit problem where $\tau_c$ is the worst case approximate convergence time to equilibrium. We then show that both epidemic control and game control are special cases of equilibrium bandits, where $\tau_c\log \tau_c$ typically dominates the regret. We then test UECB numerically for both of these applications.

* Accepted at the 22nd International Conference on Autonomous Agents and Multiagent Systems (2023)

Via

Access Paper or Ask Questions

Reinforcement Learning in Non-Markovian Environments

Nov 03, 2022

Siddharth Chandak, Vivek S Borkar, Parth Dodhia

Abstract:Following the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation inspired by classical stochastic control that reduces the problem to recursive computation of approximate sufficient statistics.

* 15 pages, submitted to Systems and Control Letters

Via

Access Paper or Ask Questions

A Concentration Bound for LSPE($λ$)

Nov 04, 2021

Vivek S. Borkar, Siddharth Chandak, Harsh Dolhare

Abstract:The popular LSPE($\lambda$) algorithm for policy evaluation is revisited to derive a concentration bound that gives high probability performance guarantees from some time on.

* 12 pages, submitted to JMLR

Via

Access Paper or Ask Questions

Concentration of Contractive Stochastic Approximation and Reinforcement Learning

Jun 27, 2021

Siddharth Chandak, Vivek S. Borkar

Abstract:Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises. These are applied to reinforcement learning algorithms, in particular to asynchronous Q-learning and TD(0).

* 15 pages, Submitted to Stochastic Systems

Via

Access Paper or Ask Questions