Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Akhil Bagaria

Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning

Jun 05, 2023

Sam Lobel, Akhil Bagaria, George Konidaris

Abstract:We propose a new method for count-based exploration in high-dimensional state spaces. Unlike previous work which relies on density models, we show that counts can be derived by averaging samples from the Rademacher distribution (or coin flips). This insight is used to set up a simple supervised learning objective which, when optimized, yields a state's visitation count. We show that our method is significantly more effective at deducing ground-truth visitation counts than previous work; when used as an exploration bonus for a model-free reinforcement learning algorithm, it outperforms existing approaches on most of 9 challenging exploration tasks, including the Atari game Montezuma's Revenge.

* 11 pages (+9 appendix). Published as a conference paper at ICML 2023. Code available at https://github.com/samlobel/CFN/

Via

Access Paper or Ask Questions

Scaling Goal-based Exploration via Pruning Proto-goals

Feb 09, 2023

Akhil Bagaria, Ray Jiang, Ramana Kumar, Tom Schaul

Abstract:One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short. Goal-directed, purposeful behaviours are able to overcome this, but rely on a good goal space. The core challenge in goal discovery is finding the right balance between generality (not hand-crafted) and tractability (useful, not too many). Our approach explicitly seeks the middle ground, enabling the human designer to specify a vast but meaningful proto-goal space, and an autonomous discovery process to refine this to a narrower space of controllable, reachable, novel, and relevant goals. The effectiveness of goal-conditioned exploration with the latter is then demonstrated in three challenging environments.

Via

Access Paper or Ask Questions