Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Feb 17, 2021

Junyu Zhang, Chengzhuo Ni, Zheng Yu, Csaba Szepesvari, Mengdi Wang

Figure 1 for On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Share this with someone who'll enjoy it:

Abstract:Policy gradient gives rise to a rich class of reinforcement learning (RL) methods, for example the REINFORCE. Yet the best known sample complexity result for such methods to find an $\epsilon$-optimal policy is $\mathcal{O}(\epsilon^{-3})$, which is suboptimal. In this paper, we study the fundamental convergence properties and sample efficiency of first-order policy optimization method. We focus on a generalized variant of policy gradient method, which is able to maximize not only a cumulative sum of rewards but also a general utility function over a policy's long-term visiting distribution. By exploiting the problem's hidden convex nature and leveraging techniques from composition optimization, we propose a Stochastic Incremental Variance-Reduced Policy Gradient (SIVR-PG) approach that improves a sequence of policies to provably converge to the global optimal solution and finds an $\epsilon$-optimal policy using $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Paper and Code