Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Rethinking Exploration for Sample-Efficient Policy Learning

Jan 23, 2021

William F. Whitney, Michael Bloesch, Jost Tobias Springenberg, Abbas Abdolmaleki, Martin Riedmiller

Figure 1 for Rethinking Exploration for Sample-Efficient Policy Learning

Figure 2 for Rethinking Exploration for Sample-Efficient Policy Learning

Figure 3 for Rethinking Exploration for Sample-Efficient Policy Learning

Figure 4 for Rethinking Exploration for Sample-Efficient Policy Learning

Share this with someone who'll enjoy it:

Abstract:Off-policy reinforcement learning for control has made great strides in terms of performance and sample efficiency. We suggest that for many tasks the sample efficiency of modern methods is now limited by the richness of the data collected rather than the difficulty of policy fitting. We examine the reasons that directed exploration methods in the bonus-based exploration (BBE) family have not been more influential in the sample efficient control problem. Three issues have limited the applicability of BBE: bias with finite samples, slow adaptation to decaying bonuses, and lack of optimism on unseen transitions. We propose modifications to the bonus-based exploration recipe to address each of these limitations. The resulting algorithm, which we call UFO, produces policies that are Unbiased with finite samples, Fast-adapting as the exploration bonus changes, and Optimistic with respect to new transitions. We include experiments showing that rapid directed exploration is a promising direction to improve sample efficiency for control.

View paper on

Share this with someone who'll enjoy it:

Title:Rethinking Exploration for Sample-Efficient Policy Learning

Paper and Code