Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Regret Bounds for Reinforcement Learning via Markov Chain Concentration

Aug 06, 2018

Ronald Ortner

Share this with someone who'll enjoy it:

Abstract:We give a simple optimistic algorithm for which it is easy to derive regret bounds of $\tilde{O}(\sqrt{t_{\rm mix} SAT})$ after $T$ steps in uniformly ergodic MDPs with $S$ states, $A$ actions, and mixing time parameter $t_{\rm mix}$. These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.

View paper on

Share this with someone who'll enjoy it:

Title:Regret Bounds for Reinforcement Learning via Markov Chain Concentration

Paper and Code