Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhenan Wu

Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs

Nov 16, 2024

Philips George John, Arnab Bhattacharyya, Silviu Maniu, Dimitrios Myrisiotis, Zhenan Wu

Figure 1 for Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs

Figure 2 for Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs

Figure 3 for Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs

Figure 4 for Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs

Abstract:Reinforcement learning algorithms are usually stated without theoretical guarantees regarding their performance. Recently, Jin, Yang, Wang, and Jordan (COLT 2020) showed a polynomial-time reinforcement learning algorithm (namely, LSVI-UCB) for the setting of linear Markov decision processes, and provided theoretical guarantees regarding its running time and regret. In real-world scenarios, however, the space usage of this algorithm can be prohibitive due to a utilized linear regression step. We propose and analyze two modifications of LSVI-UCB, which alternate periods of learning and not-learning, to reduce space and time usage while maintaining sublinear regret. We show experimentally, on synthetic data and real-world benchmarks, that our algorithms achieve low space usage and running time, while not significantly sacrificing regret.

* 27 pages, 9 figures

Via

Access Paper or Ask Questions