Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

Feb 22, 2023

Emmeran Johnson, Ciara Pike-Burke, Patrick Rebeschini

Figure 1 for Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

Figure 2 for Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

Share this with someone who'll enjoy it:

Abstract:The classical algorithms used in tabular reinforcement learning (Value Iteration and Policy Iteration) have been shown to converge linearly with a rate given by the discount factor $\gamma$ of a discounted Markov Decision Process. Recently, there has been an increased interest in the study of gradient based methods. In this work, we show that the dimension-free linear $\gamma$-rate of classical reinforcement learning algorithms can be achieved by a general family of unregularised Policy Mirror Descent (PMD) algorithms under an adaptive step-size. We also provide a matching worst-case lower-bound that demonstrates that the $\gamma$-rate is optimal for PMD methods. Our work offers a novel perspective on the convergence of PMD. We avoid the use of the performance difference lemma beyond establishing the monotonic improvement of the iterates, which leads to a simple analysis that may be of independent interest. We also extend our analysis to the inexact setting and establish the first dimension-free $\varepsilon$-optimal sample complexity for unregularised PMD under a generative model, improving upon the best-known result.

* 27 pages, 1 figure

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

Paper and Code