Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards Characterizing Divergence in Deep Q-Learning

Mar 21, 2019

Joshua Achiam, Ethan Knight, Pieter Abbeel

Figure 1 for Towards Characterizing Divergence in Deep Q-Learning

Figure 2 for Towards Characterizing Divergence in Deep Q-Learning

Figure 3 for Towards Characterizing Divergence in Deep Q-Learning

Figure 4 for Towards Characterizing Divergence in Deep Q-Learning

Share this with someone who'll enjoy it:

Abstract:Deep Q-Learning (DQL), a family of temporal difference algorithms for control, employs three techniques collectively known as the `deadly triad' in reinforcement learning: bootstrapping, off-policy learning, and function approximation. Prior work has demonstrated that together these can lead to divergence in Q-learning algorithms, but the conditions under which divergence occurs are not well-understood. In this note, we give a simple analysis based on a linear approximation to the Q-value updates, which we believe provides insight into divergence under the deadly triad. The central point in our analysis is to consider when the leading order approximation to the deep-Q update is or is not a contraction in the sup norm. Based on this analysis, we develop an algorithm which permits stable deep Q-learning for continuous control without any of the tricks conventionally used (such as target networks, adaptive gradient optimizers, or using multiple Q functions). We demonstrate that our algorithm performs above or near state-of-the-art on standard MuJoCo benchmarks from the OpenAI Gym.

View paper on

Share this with someone who'll enjoy it:

Title:Towards Characterizing Divergence in Deep Q-Learning

Paper and Code