Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:An Adiabatic Theorem for Policy Tracking with TD-learning

Oct 30, 2020

Neil Walton

Share this with someone who'll enjoy it:

Abstract:We evaluate the ability of temporal difference learning to track the reward function of a policy as it changes over time. Our results apply a new adiabatic theorem that bounds the mixing time of time-inhomogeneous Markov chains. We derive finite-time bounds for tabular temporal difference learning and $Q$-learning when the policy used for training changes in time. To achieve this, we develop bounds for stochastic approximation under asynchronous adiabatic updates.

View paper on

Share this with someone who'll enjoy it:

Title:An Adiabatic Theorem for Policy Tracking with TD-learning

Paper and Code