Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

Jun 23, 2022

Pihe Hu, Yu Chen, Longbo Huang

Figure 1 for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

Figure 2 for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

Figure 3 for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

Share this with someone who'll enjoy it:

Abstract:We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping $\boldsymbol{\phi}(s,a)$. Specifically, we consider the episodic inhomogeneous linear Markov Decision Process (MDP), and propose a novel computation-efficient algorithm, LSVI-UCB$^+$, which achieves an $\widetilde{O}(Hd\sqrt{T})$ regret bound where $H$ is the episode length, $d$ is the feature dimension, and $T$ is the number of steps. LSVI-UCB$^+$ builds on weighted ridge regression and upper confidence value iteration with a Bernstein-type exploration bonus. Our statistical results are obtained with novel analytical tools, including a new Bernstein self-normalized bound with conservatism on elliptical potentials, and refined analysis of the correction term. To the best of our knowledge, this is the first minimax optimal algorithm for linear MDPs up to logarithmic factors, which closes the $\sqrt{Hd}$ gap between the best known upper bound of $\widetilde{O}(\sqrt{H^3d^3T})$ in \cite{jin2020provably} and lower bound of $\Omega(Hd\sqrt{T})$ for linear MDPs.

* Accepted by ICML 2022

View paper on

Share this with someone who'll enjoy it:

Title:Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

Paper and Code