Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:How to Combine Tree-Search Methods in Reinforcement Learning

Sep 06, 2018

Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Figure 1 for How to Combine Tree-Search Methods in Reinforcement Learning

Figure 2 for How to Combine Tree-Search Methods in Reinforcement Learning

Figure 3 for How to Combine Tree-Search Methods in Reinforcement Learning

Figure 4 for How to Combine Tree-Search Methods in Reinforcement Learning

Share this with someone who'll enjoy it:

Abstract:Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success. Usually, the lookahead policies are implemented with specific planning methods such as Monte Carlo Tree Search (e.g. in AlphaZero). Referring to the planning problem as tree search, a reasonable practice in these implementations is to back up the value only at the leaves while the information obtained at the root is not leveraged other than for updating the policy. Here, we question the potency of this approach.Namely, the latter procedure is non-contractive in general, and its convergence is not guaranteed. Our proposed enhancement is straightforward and simple: use the return from the optimal tree path to back up the values at the descendants of the root. This leads to a \gamma^h-contracting procedure, where \gamma is the discount factor and $h$ is the tree depth. To establish our results, we first introduce a notion called multiple-step greedy consistency. We then provide convergence rates for two algorithmic instantiations of the above enhancement in the presence of noise injected to both the tree search stage and value estimation stage.

View paper on

Share this with someone who'll enjoy it:

Title:How to Combine Tree-Search Methods in Reinforcement Learning

Paper and Code