Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Revisiting Peng's Q for Modern Reinforcement Learning

Feb 27, 2021

Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

Figure 1 for Revisiting Peng's Q for Modern Reinforcement Learning

Figure 2 for Revisiting Peng's Q for Modern Reinforcement Learning

Figure 3 for Revisiting Peng's Q for Modern Reinforcement Learning

Figure 4 for Revisiting Peng's Q for Modern Reinforcement Learning

Share this with someone who'll enjoy it:

Abstract:Off-policy multi-step reinforcement learning algorithms consist of conservative and non-conservative algorithms: the former actively cut traces, whereas the latter do not. Recently, Munos et al. (2016) proved the convergence of conservative algorithms to an optimal Q-function. In contrast, non-conservative algorithms are thought to be unsafe and have a limited or no theoretical guarantee. Nonetheless, recent studies have shown that non-conservative algorithms empirically outperform conservative ones. Motivated by the empirical results and the lack of theory, we carry out theoretical analyses of Peng's Q($\lambda$), a representative example of non-conservative algorithms. We prove that it also converges to an optimal policy provided that the behavior policy slowly tracks a greedy policy in a way similar to conservative policy iteration. Such a result has been conjectured to be true but has not been proven. We also experiment with Peng's Q($\lambda$) in complex continuous control tasks, confirming that Peng's Q($\lambda$) often outperforms conservative algorithms despite its simplicity. These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.

* 26 pages, 7 figures, 2 tables

View paper on

Share this with someone who'll enjoy it:

Title:Revisiting Peng's Q for Modern Reinforcement Learning

Paper and Code