Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Per-decision Multi-step Temporal Difference Learning with Control Variates

Jul 05, 2018

Kristopher De Asis, Richard S. Sutton

Figure 1 for Per-decision Multi-step Temporal Difference Learning with Control Variates

Figure 2 for Per-decision Multi-step Temporal Difference Learning with Control Variates

Figure 3 for Per-decision Multi-step Temporal Difference Learning with Control Variates

Figure 4 for Per-decision Multi-step Temporal Difference Learning with Control Variates

Share this with someone who'll enjoy it:

Abstract:Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme. They address a bias-variance trade off between reliance on current estimates, which could be poor, and incorporating longer sampled reward sequences into the updates. Especially in the off-policy setting, where the agent aims to learn about a policy different from the one generating its behaviour, the variance in the updates can cause learning to diverge as the number of sampled rewards used in the estimates increases. In this paper, we introduce per-decision control variates for multi-step TD algorithms, and compare them to existing methods. Our results show that including the control variates can greatly improve performance on both on and off-policy multi-step temporal difference learning tasks.

* (2018). In Conference on Uncertainty in Artificial Intelligence. http://auai.org/uai2018/proceedings/papers/282.pdf

View paper on

Share this with someone who'll enjoy it:

Title:Per-decision Multi-step Temporal Difference Learning with Control Variates

Paper and Code