Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning

Sep 30, 2024

Zhishuai Liu, Weixin Wang, Pan Xu

Figure 1 for Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning

Figure 2 for Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning

Figure 3 for Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning

Figure 4 for Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning

Share this with someone who'll enjoy it:

Abstract:We study off-dynamics Reinforcement Learning (RL), where the policy training and deployment environments are different. To deal with this environmental perturbation, we focus on learning policies robust to uncertainties in transition dynamics under the framework of distributionally robust Markov decision processes (DRMDPs), where the nominal and perturbed dynamics are linear Markov Decision Processes. We propose a novel algorithm We-DRIVE-U that enjoys an average suboptimality $\widetilde{\mathcal{O}}\big({d H \cdot \min \{1/{\rho}, H\}/\sqrt{K} }\big)$, where $K$ is the number of episodes, $H$ is the horizon length, $d$ is the feature dimension and $\rho$ is the uncertainty level. This result improves the state-of-the-art by $\mathcal{O}(dH/\min\{1/\rho,H\})$. We also construct a novel hard instance and derive the first information-theoretic lower bound in this setting, which indicates our algorithm is near-optimal up to $\mathcal{O}(\sqrt{H})$ for any uncertainty level $\rho\in(0,1]$. Our algorithm also enjoys a 'rare-switching' design, and thus only requires $\mathcal{O}(dH\log(1+H^2K))$ policy switches and $\mathcal{O}(d^2H\log(1+H^2K))$ calls for oracle to solve dual optimization problems, which significantly improves the computational efficiency of existing algorithms for DRMDPs, whose policy switch and oracle complexities are both $\mathcal{O}(K)$.

* 48 pages, 3 figures, 2 tables

View paper on

Share this with someone who'll enjoy it:

Title:Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning

Paper and Code