Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

Sep 01, 2019

Matthia Sabatelli, Gilles Louppe, Pierre Geurts, Marco A. Wiering

Figure 1 for Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

Figure 2 for Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

Figure 3 for Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

Figure 4 for Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

Share this with someone who'll enjoy it:

Abstract:This paper makes one step forward towards characterizing a new family of \textit{model-free} Deep Reinforcement Learning (DRL) algorithms. The aim of these algorithms is to jointly learn an approximation of the state-value function ($V$), alongside an approximation of the state-action value function ($Q$). Our analysis starts with a thorough study of the Deep Quality-Value Learning (DQV) algorithm, a DRL algorithm which has been shown to outperform popular techniques such as Deep-Q-Learning (DQN) and Double-Deep-Q-Learning (DDQN) \cite{sabatelli2018deep}. Intending to investigate why DQV's learning dynamics allow this algorithm to perform so well, we formulate a set of research questions which help us characterize a new family of DRL algorithms. Among our results, we present some specific cases in which DQV's performance can get harmed and introduce a novel \textit{off-policy} DRL algorithm, called DQV-Max, which can outperform DQV. We then study the behavior of the $V$ and $Q$ functions that are learned by DQV and DQV-Max and show that both algorithms might perform so well on several DRL test-beds because they are less prone to suffer from the overestimation bias of the $Q$ function.

View paper on

Share this with someone who'll enjoy it:

Title:Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

Paper and Code