Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Off-Policy Actor-Critic with Shared Experience Replay

Sep 25, 2019

Simon Schmitt, Matteo Hessel, Karen Simonyan

Figure 1 for Off-Policy Actor-Critic with Shared Experience Replay

Figure 2 for Off-Policy Actor-Critic with Shared Experience Replay

Figure 3 for Off-Policy Actor-Critic with Shared Experience Replay

Figure 4 for Off-Policy Actor-Critic with Shared Experience Replay

Share this with someone who'll enjoy it:

Abstract:We investigate the combination of actor-critic reinforcement learning algorithms with uniform large-scale experience replay and propose solutions for two challenges: (a) efficient actor-critic learning with experience replay (b) stability of very off-policy learning. We employ those insights to accelerate hyper-parameter sweeps in which all participating agents run concurrently and share their experience via a common replay module. To this end we analyze the bias-variance tradeoffs in V-trace, a form of importance sampling for actor-critic methods. Based on our analysis, we then argue for mixing experience sampled from replay with on-policy experience, and propose a new trust region scheme that scales effectively to data distributions where V-trace becomes unstable. We provide extensive empirical validation of the proposed solution. We further show the benefits of this setup by demonstrating state-of-the-art data efficiency on Atari among agents trained up until 200M environment frames.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Off-Policy Actor-Critic with Shared Experience Replay

Paper and Code