Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Improving On-policy Learning with Statistical Reward Accumulation

Sep 07, 2018

Yubin Deng, Ke Yu, Dahua Lin, Xiaoou Tang, Chen Change Loy

Figure 1 for Improving On-policy Learning with Statistical Reward Accumulation

Figure 2 for Improving On-policy Learning with Statistical Reward Accumulation

Figure 3 for Improving On-policy Learning with Statistical Reward Accumulation

Figure 4 for Improving On-policy Learning with Statistical Reward Accumulation

Share this with someone who'll enjoy it:

Abstract:Deep reinforcement learning has obtained significant breakthroughs in recent years. Most methods in deep-RL achieve good results via the maximization of the reward signal provided by the environment, typically in the form of discounted cumulative returns. Such reward signals represent the immediate feedback of a particular action performed by an agent. However, tasks with sparse reward signals are still challenging to on-policy methods. In this paper, we introduce an effective characterization of past reward statistics (which can be seen as long-term feedback signals) to supplement this immediate reward feedback. In particular, value functions are learned with multi-critics supervision, enabling complex value functions to be more easily approximated in on-policy learning, even when the reward signals are sparse. We also introduce a novel exploration mechanism called "hot-wiring" that can give a boost to seemingly trapped agents. We demonstrate the effectiveness of our advantage actor multi-critic (A2MC) method across the discrete domains in Atari games as well as continuous domains in the MuJoCo environments. A video demo is provided at https://youtu.be/zBmpf3Yz8tc.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Improving On-policy Learning with Statistical Reward Accumulation

Paper and Code