Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Akash Reddy

Matching options to tasks using Option-Indexed Hierarchical Reinforcement Learning

Jun 12, 2022

Kushal Chauhan, Soumya Chatterjee, Akash Reddy, Balaraman Ravindran, Pradeep Shenoy

Figure 1 for Matching options to tasks using Option-Indexed Hierarchical Reinforcement Learning

Figure 2 for Matching options to tasks using Option-Indexed Hierarchical Reinforcement Learning

Figure 3 for Matching options to tasks using Option-Indexed Hierarchical Reinforcement Learning

Figure 4 for Matching options to tasks using Option-Indexed Hierarchical Reinforcement Learning

Abstract:The options framework in Hierarchical Reinforcement Learning breaks down overall goals into a combination of options or simpler tasks and associated policies, allowing for abstraction in the action space. Ideally, these options can be reused across different higher-level goals; indeed, such reuse is necessary to realize the vision of a continual learning agent that can effectively leverage its prior experience. Previous approaches have only proposed limited forms of transfer of prelearned options to new task settings. We propose a novel option indexing approach to hierarchical learning (OI-HRL), where we learn an affinity function between options and the items present in the environment. This allows us to effectively reuse a large library of pretrained options, in zero-shot generalization at test time, by restricting goal-directed learning to only those options relevant to the task at hand. We develop a meta-training loop that learns the representations of options and environments over a series of HRL problems, by incorporating feedback about the relevance of retrieved options to the higher-level goal. We evaluate OI-HRL in two simulated settings - the CraftWorld and AI2THOR environments - and show that we achieve performance competitive with oracular baselines, and substantial gains over a baseline that has the entire option pool available for learning the hierarchical policy.

* 10 pages, 4 figures

Via

Access Paper or Ask Questions