Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Single-partition adaptive Q-learning

Jul 14, 2020

João Pedro Araújo, Mário Figueiredo, Miguel Ayala Botto

Figure 1 for Single-partition adaptive Q-learning

Figure 2 for Single-partition adaptive Q-learning

Figure 3 for Single-partition adaptive Q-learning

Figure 4 for Single-partition adaptive Q-learning

Share this with someone who'll enjoy it:

Abstract:This paper introduces single-partition adaptive Q-learning (SPAQL), an algorithm for model-free episodic reinforcement learning (RL), which adaptively partitions the state-action space of a Markov decision process (MDP), while simultaneously learning a time-invariant policy (i. e., the mapping from states to actions does not depend explicitly on the episode time step) for maximizing the cumulative reward. The trade-off between exploration and exploitation is handled by using a mixture of upper confidence bounds (UCB) and Boltzmann exploration during training, with a temperature parameter that is automatically tuned as training progresses. The algorithm is an improvement over adaptive Q-learning (AQL). It converges faster to the optimal solution, while also using fewer arms. Tests on episodes with a large number of time steps show that SPAQL has no problems scaling, unlike AQL. Based on this empirical evidence, we claim that SPAQL may have a higher sample efficiency than AQL, thus being a relevant contribution to the field of efficient model-free RL methods.

* 34 pages, 15 figures

View paper on

Share this with someone who'll enjoy it:

Title:Single-partition adaptive Q-learning

Paper and Code