Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sajad Khodadadian

A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging

May 27, 2025

Sajad Khodadadian, Martin Zubeldia

Abstract:Polyak-Ruppert averaging is a widely used technique to achieve the optimal asymptotic variance of stochastic approximation (SA) algorithms, yet its high-probability performance guarantees remain underexplored in general settings. In this paper, we present a general framework for establishing non-asymptotic concentration bounds for the error of averaged SA iterates. Our approach assumes access to individual concentration bounds for the unaveraged iterates and yields a sharp bound on the averaged iterates. We also construct an example, showing the tightness of our result up to constant multiplicative factors. As direct applications, we derive tight concentration bounds for contractive SA algorithms and for algorithms such as temporal difference learning and Q-learning with averaging, obtaining new bounds in settings where traditional analysis is challenging.

* 37 pages

Via

Access Paper or Ask Questions

Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise

Dec 31, 2023

Shaan Ul Haque, Sajad Khodadadian, Siva Theja Maguluri

Abstract:Stochastic approximation (SA) is an iterative algorithm to find the fixed point of an operator given noisy samples of this operator. SA appears in many areas such as optimization and Reinforcement Learning (RL). When implemented in practice, the noise that appears in the update of RL algorithms is naturally Markovian. Furthermore, in some settings, such as gradient TD, SA is employed in a two-time-scale manner. The mix of Markovian noise along with the two-time-scale structure results in an algorithm which is complex to analyze theoretically. In this paper, we characterize a tight convergence bound for the iterations of linear two-time-scale SA with Markovian noise. Our results show the convergence behavior of this algorithm given various choices of step sizes. Applying our result to the well-known TDC algorithm, we show the first $O(1/\epsilon)$ sample complexity for the convergence of this algorithm, outperforming all the previous work. Similarly, our results can be applied to establish the convergence behavior of a variety of RL algorithms, such as TD-learning with Polyak averaging, GTD, and GTD2.

* 48 pages, 3 figures

Via

Access Paper or Ask Questions

Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling

Jun 21, 2022

Sajad Khodadadian, Pranay Sharma, Gauri Joshi, Siva Theja Maguluri

Figure 1 for Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling

Figure 2 for Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling

Abstract:Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling observations from the environment is usually split across multiple agents. However, transferring these observations from the agents to a central location can be prohibitively expensive in terms of the communication cost, and it can also compromise the privacy of each agent's local behavior policy. In this paper, we consider a federated reinforcement learning framework where multiple agents collaboratively learn a global model, without sharing their individual data and policies. Each agent maintains a local copy of the model and updates it using locally sampled data. Although having N agents enables the sampling of N times more data, it is not clear if it leads to proportional convergence speedup. We propose federated versions of on-policy TD, off-policy TD and Q-learning, and analyze their convergence. For all these algorithms, to the best of our knowledge, we are the first to consider Markovian noise and multiple local updates, and prove a linear convergence speedup with respect to the number of agents. To obtain these results, we show that federated TD and Q-learning are special cases of a general framework for federated stochastic approximation with Markovian noise, and we leverage this framework to provide a unified convergence analysis that applies to all the algorithms.

* 69 pages, 1 figure, accepted to ICML 2022 for long presentation

Via

Access Paper or Ask Questions

Information Theoretic Measures for Fairness-aware Feature Selection

Jun 08, 2021

Sajad Khodadadian, Mohamed Nafea, AmirEmad Ghassami, Negar Kiyavash

Figure 1 for Information Theoretic Measures for Fairness-aware Feature Selection

Figure 2 for Information Theoretic Measures for Fairness-aware Feature Selection

Figure 3 for Information Theoretic Measures for Fairness-aware Feature Selection

Figure 4 for Information Theoretic Measures for Fairness-aware Feature Selection

Abstract:Machine learning algorithms are increasingly used for consequential decision making regarding individuals based on their relevant features. Features that are relevant for accurate decisions may however lead to either explicit or implicit forms of discrimination against unprivileged groups, such as those of certain race or gender. This happens due to existing biases in the training data, which are often replicated or even exacerbated by the learning algorithm. Identifying and measuring these biases at the data level is a challenging problem due to the interdependence among the features, and the decision outcome. In this work, we develop a framework for fairness-aware feature selection which takes into account the correlation among the features and the decision outcome, and is based on information theoretic measures for the accuracy and discriminatory impacts of features. In particular, we first propose information theoretic measures which quantify the impact of different subsets of features on the accuracy and discrimination of the decision outcomes. We then deduce the marginal impact of each feature using Shapley value function; a solution concept in cooperative game theory used to estimate marginal contributions of players in a coalitional game. Finally, we design a fairness utility score for each feature (for feature selection) which quantifies how this feature influences accurate as well as nondiscriminatory decisions. Our framework depends on the joint statistics of the data rather than a particular classifier design. We examine our proposed framework on real and synthetic data to evaluate its performance.

* 15 pages, 6 figures

Via

Access Paper or Ask Questions

Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

May 26, 2021

Zaiwei Chen, Sajad Khodadadian, Siva Theja Maguluri

Figure 1 for Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

Abstract:In this paper, we develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation and we establish a sample complexity of $\mathcal{O}(\epsilon^{-3})$, outperforming all the previously known convergence bounds of such algorithms. In order to overcome the divergence due to deadly triad in off-policy policy evaluation under function approximation, we develop a critic that employs $n$-step TD-learning algorithm with a properly chosen $n$. We present finite-sample convergence bounds on this critic under both constant and diminishing step sizes, which are of independent interest. Furthermore, we develop a variant of natural policy gradient under function approximation, with an improved convergence rate of $\mathcal{O}(1/T)$ after $T$ iterations. Combining the finite sample error bounds of actor and the critic, we obtain the $\mathcal{O}(\epsilon^{-3})$ sample complexity. We derive our sample complexity bounds solely based on the assumption that the behavior policy sufficiently explores all the states and actions, which is a much lighter assumption compared to the related literature.

Via

Access Paper or Ask Questions

On the Linear convergence of Natural Policy Gradient Algorithm

May 04, 2021

Sajad Khodadadian, Prakirt Raj Jhunjhunwala, Sushil Mahavir Varma, Siva Theja Maguluri

Figure 1 for On the Linear convergence of Natural Policy Gradient Algorithm

Abstract:Markov Decision Processes are classically solved using Value Iteration and Policy Iteration algorithms. Recent interest in Reinforcement Learning has motivated the study of methods inspired by optimization, such as gradient ascent. Among these, a popular algorithm is the Natural Policy Gradient, which is a mirror descent variant for MDPs. This algorithm forms the basis of several popular Reinforcement Learning algorithms such as Natural actor-critic, TRPO, PPO, etc, and so is being studied with growing interest. It has been shown that Natural Policy Gradient with constant step size converges with a sublinear rate of O(1/k) to the global optimal. In this paper, we present improved finite time convergence bounds, and show that this algorithm has geometric (also known as linear) asymptotic convergence rate. We further improve this convergence result by introducing a variant of Natural Policy Gradient with adaptive step sizes. Finally, we compare different variants of policy gradient methods experimentally.

* 19 pages, 1 figure, A version of this paper was first submitted to a conference in Mar 2021

Via

Access Paper or Ask Questions

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

Feb 18, 2021

Sajad Khodadadian, Zaiwei Chen, Siva Theja Maguluri

Figure 1 for Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

Figure 2 for Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

Figure 3 for Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

Abstract:In this paper, we provide finite-sample convergence guarantees for an off-policy variant of the natural actor-critic (NAC) algorithm based on Importance Sampling. In particular, we show that the algorithm converges to a global optimal policy with a sample complexity of $\mathcal{O}(\epsilon^{-3}\log^2(1/\epsilon))$ under an appropriate choice of stepsizes. In order to overcome the issue of large variance due to Importance Sampling, we propose the $Q$-trace algorithm for the critic, which is inspired by the V-trace algorithm (Espeholt et al., 2018). This enables us to explicitly control the bias and variance, and characterize the trade-off between them. As an advantage of off-policy sampling, a major feature of our result is that we do not need any additional assumptions, beyond the ergodicity of the Markov chain induced by the behavior policy.

Via

Access Paper or Ask Questions

Impact of Data Processing on Fairness in Supervised Learning

Feb 03, 2021

Sajad Khodadadian, AmirEmad Ghassami, Negar Kiyavash

Figure 1 for Impact of Data Processing on Fairness in Supervised Learning

Figure 2 for Impact of Data Processing on Fairness in Supervised Learning

Figure 3 for Impact of Data Processing on Fairness in Supervised Learning

Figure 4 for Impact of Data Processing on Fairness in Supervised Learning

Abstract:We study the impact of pre and post processing for reducing discrimination in data-driven decision makers. We first analyze the fundamental trade-off between fairness and accuracy in a pre-processing approach, and propose a design for a pre-processing module based on a convex optimization program, which can be added before the original classifier. This leads to a fundamental lower bound on attainable discrimination, given any acceptable distortion in the outcome. Furthermore, we reformulate an existing post-processing method in terms of our accuracy and fairness measures, which allows comparing post-processing and pre-processing approaches. We show that under some mild conditions, pre-processing outperforms post-processing. Finally, we show that by appropriate choice of the discrimination measure, the optimization problem for both pre and post processing approaches will reduce to a linear program and hence can be solved efficiently.

* 18 pages, 4 figures

Via

Access Paper or Ask Questions

Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm

Jan 26, 2021

Sajad Khodadadian, Thinh T. Doan, Siva Theja Maguluri, Justin Romberg

Figure 1 for Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm

Figure 2 for Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm

Figure 3 for Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm

Figure 4 for Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm

Abstract:Actor-critic style two-time-scale algorithms are very popular in reinforcement learning, and have seen great empirical success. However, their performance is not completely understood theoretically. In this paper, we characterize the global convergence of an online natural actor-critic algorithm in the tabular setting using a single trajectory. Our analysis applies to very general settings, as we only assume that the underlying Markov chain is ergodic under all policies (the so-called Recurrence assumption). We employ $\epsilon$-greedy sampling in order to ensure enough exploration. For a fixed exploration parameter $\epsilon$, we show that the natural actor critic algorithm is $\mathcal{O}(\frac{1}{\epsilon T^{1/4}}+\epsilon)$ close to the global optimum after $T$ iterations of the algorithm. By carefully diminishing the exploration parameter $\epsilon$ as the iterations proceed, we also show convergence to the global optimum at a rate of $\mathcal{O}(1/T^{1/6})$.

* 34 pages, 6 figures

Via

Access Paper or Ask Questions

Fairness in Supervised Learning: An Information Theoretic Approach

Jul 29, 2018

AmirEmad Ghassami, Sajad Khodadadian, Negar Kiyavash

Figure 1 for Fairness in Supervised Learning: An Information Theoretic Approach

Abstract:Automated decision making systems are increasingly being used in real-world applications. In these systems for the most part, the decision rules are derived by minimizing the training error on the available historical data. Therefore, if there is a bias related to a sensitive attribute such as gender, race, religion, etc. in the data, say, due to cultural/historical discriminatory practices against a certain demographic, the system could continue discrimination in decisions by including the said bias in its decision rule. We present an information theoretic framework for designing fair predictors from data, which aim to prevent discrimination against a specified sensitive attribute in a supervised learning setting. We use equalized odds as the criterion for discrimination, which demands that the prediction should be independent of the protected attribute conditioned on the actual label. To ensure fairness and generalization simultaneously, we compress the data to an auxiliary variable, which is used for the prediction task. This auxiliary variable is chosen such that it is decontaminated from the discriminatory attribute in the sense of equalized odds. The final predictor is obtained by applying a Bayesian decision rule to the auxiliary variable.

Via

Access Paper or Ask Questions