Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

Dec 29, 2020

Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric

Figure 1 for Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

Figure 2 for Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

Figure 3 for Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

Figure 4 for Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

Share this with someone who'll enjoy it:

Abstract:We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of $\epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps (in expectation) from a reference state $s_0$. In this paper, we introduce a novel model-based approach that interleaves discovering new states from $s_0$ and improving the accuracy of a model estimate that is used to compute goal-conditioned policies to reach newly discovered states. The resulting algorithm, DisCo, achieves a sample complexity scaling as $\tilde{O}(L^5 S_{L+\epsilon} \Gamma_{L+\epsilon} A \epsilon^{-2})$, where $A$ is the number of actions, $S_{L+\epsilon}$ is the number of states that are incrementally reachable from $s_0$ in $L+\epsilon$ steps, and $\Gamma_{L+\epsilon}$ is the branching factor of the dynamics over such states. This improves over the algorithm proposed in [1] in both $\epsilon$ and $L$ at the cost of an extra $\Gamma_{L+\epsilon}$ factor, which is small in most environments of interest. Furthermore, DisCo is the first algorithm that can return an $\epsilon/c_{\min}$-optimal policy for any cost-sensitive shortest-path problem defined on the $L$-reachable states with minimum cost $c_{\min}$. Finally, we report preliminary empirical results confirming our theoretical findings.

* NeurIPS 2020

View paper on

Share this with someone who'll enjoy it:

Title:Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

Paper and Code