Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huazhe

Harry

Reinforcement Learning from Imperfect Demonstrations

Feb 14, 2018

Yang Gao, Huazhe, Xu, Ji Lin, Fisher Yu, Sergey Levine, Trevor Darrell

Figure 1 for Reinforcement Learning from Imperfect Demonstrations

Figure 2 for Reinforcement Learning from Imperfect Demonstrations

Figure 3 for Reinforcement Learning from Imperfect Demonstrations

Figure 4 for Reinforcement Learning from Imperfect Demonstrations

Abstract:Robust real-world learning should benefit from both demonstrations and interactions with the environment. Current approaches to learning from demonstration and reward perform supervised learning on expert demonstration data and use reinforcement learning to further improve performance based on the reward received from the environment. These tasks have divergent losses which are difficult to jointly optimize and such methods can be very sensitive to noisy demonstrations. We propose a unified reinforcement learning algorithm, Normalized Actor-Critic (NAC), that effectively normalizes the Q-function, reducing the Q-values of actions unseen in the demonstration data. NAC learns an initial policy network from demonstrations and refines the policy in the environment, surpassing the demonstrator's performance. Crucially, both learning from demonstration and interactive refinement use the same objective, unlike prior approaches that combine distinct supervised and reinforcement losses. This makes NAC robust to suboptimal demonstration data since the method is not forced to mimic all of the examples in the dataset. We show that our unified reinforcement learning algorithm can learn robustly and outperform existing baselines when evaluated on several realistic driving games.

Via

Access Paper or Ask Questions