Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Taming the Noise in Reinforcement Learning via Soft Updates

Mar 30, 2017

Roy Fox, Ari Pakman, Naftali Tishby

Figure 1 for Taming the Noise in Reinforcement Learning via Soft Updates

Figure 2 for Taming the Noise in Reinforcement Learning via Soft Updates

Figure 3 for Taming the Noise in Reinforcement Learning via Soft Updates

Figure 4 for Taming the Noise in Reinforcement Learning via Soft Updates

Share this with someone who'll enjoy it:

Abstract:Model-free reinforcement learning algorithms, such as Q-learning, perform poorly in the early stages of learning in noisy environments, because much effort is spent unlearning biased estimates of the state-action value function. The bias results from selecting, among several noisy estimates, the apparent optimum, which may actually be suboptimal. We propose G-learning, a new off-policy learning algorithm that regularizes the value estimates by penalizing deterministic policies in the beginning of the learning process. We show that this method reduces the bias of the value-function estimation, leading to faster convergence to the optimal value and the optimal policy. Moreover, G-learning enables the natural incorporation of prior domain knowledge, when available. The stochastic nature of G-learning also makes it avoid some exploration costs, a property usually attributed only to on-policy algorithms. We illustrate these ideas in several examples, where G-learning results in significant improvements of the convergence rate and the cost of the learning process.

* 32nd Conference on Uncertainty in Artificial Intelligence (UAI 2016)

View paper on

Share this with someone who'll enjoy it:

Title:Taming the Noise in Reinforcement Learning via Soft Updates

Paper and Code