Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abhik Ray

SIBRE: Self Improvement Based REwards for Reinforcement Learning

Apr 22, 2020

Somjit Nath, Richa Verma, Abhik Ray, Harshad Khadilkar

Figure 1 for SIBRE: Self Improvement Based REwards for Reinforcement Learning

Figure 2 for SIBRE: Self Improvement Based REwards for Reinforcement Learning

Figure 3 for SIBRE: Self Improvement Based REwards for Reinforcement Learning

Figure 4 for SIBRE: Self Improvement Based REwards for Reinforcement Learning

Abstract:We propose a generic reward shaping approach for improving rate of convergence in reinforcement learning (RL), called Self Improvement Based REwards, or SIBRE. The approach can be used for episodic environments in conjunction with any existing RL algorithm, and consists of rewarding improvement over the agent's own past performance. We show that SIBRE converges under the same conditions as the algorithm whose reward has been modified. The new rewards help discriminate between policies when the original rewards are either weakly discriminated or sparse. Experiments show that in certain environments, this approach speeds up learning and converges to the optimal policy faster. We analyse SIBRE theoretically, and follow it up with tests on several well-known benchmark environments for reinforcement learning.

* 7 pages, 10 figures

Via

Access Paper or Ask Questions