Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Reward-Respecting Subtasks for Model-Based Reinforcement Learning

Feb 09, 2022

Richard S. Sutton, Marlos C. Machado, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White

Figure 1 for Reward-Respecting Subtasks for Model-Based Reinforcement Learning

Figure 2 for Reward-Respecting Subtasks for Model-Based Reinforcement Learning

Figure 3 for Reward-Respecting Subtasks for Model-Based Reinforcement Learning

Figure 4 for Reward-Respecting Subtasks for Model-Based Reinforcement Learning

Share this with someone who'll enjoy it:

Abstract:To achieve the ambitious goals of artificial intelligence, reinforcement learning must include planning with a model of the world that is abstract in state and time. Deep learning has made progress in state abstraction, but, although the theory of time abstraction has been extensively developed based on the options framework, in practice options have rarely been used in planning. One reason for this is that the space of possible options is immense and the methods previously proposed for option discovery do not take into account how the option models will be used in planning. Options are typically discovered by posing subsidiary tasks such as reaching a bottleneck state, or maximizing a sensory signal other than the reward. Each subtask is solved to produce an option, and then a model of the option is learned and made available to the planning process. The subtasks proposed in most previous work ignore the reward on the original problem, whereas we propose subtasks that use the original reward plus a bonus based on a feature of the state at the time the option stops. We show that options and option models obtained from such reward-respecting subtasks are much more likely to be useful in planning and can be learned online and off-policy using existing learning algorithms. Reward respecting subtasks strongly constrain the space of options and thereby also provide a partial solution to the problem of option discovery. Finally, we show how the algorithms for learning values, policies, options, and models can be unified using general value functions.

View paper on

Share this with someone who'll enjoy it:

Title:Reward-Respecting Subtasks for Model-Based Reinforcement Learning

Paper and Code