Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gregory M. Gremillion

Gaze-Informed Multi-Objective Imitation Learning from Human Demonstrations

Feb 25, 2021

Ritwik Bera, Vinicius G. Goecks, Gregory M. Gremillion, Vernon J. Lawhern, John Valasek, Nicholas R. Waytowich

Figure 1 for Gaze-Informed Multi-Objective Imitation Learning from Human Demonstrations

Figure 2 for Gaze-Informed Multi-Objective Imitation Learning from Human Demonstrations

Figure 3 for Gaze-Informed Multi-Objective Imitation Learning from Human Demonstrations

Figure 4 for Gaze-Informed Multi-Objective Imitation Learning from Human Demonstrations

Abstract:In the field of human-robot interaction, teaching learning agents from human demonstrations via supervised learning has been widely studied and successfully applied to multiple domains such as self-driving cars and robot manipulation. However, the majority of the work on learning from human demonstrations utilizes only behavioral information from the demonstrator, i.e. what actions were taken, and ignores other useful information. In particular, eye gaze information can give valuable insight towards where the demonstrator is allocating their visual attention, and leveraging such information has the potential to improve agent performance. Previous approaches have only studied the utilization of attention in simple, synchronous environments, limiting their applicability to real-world domains. This work proposes a novel imitation learning architecture to learn concurrently from human action demonstration and eye tracking data to solve tasks where human gaze information provides important context. The proposed method is applied to a visual navigation task, in which an unmanned quadrotor is trained to search for and navigate to a target vehicle in a real-world, photorealistic simulated environment. When compared to a baseline imitation learning architecture, results show that the proposed gaze augmented imitation learning model is able to learn policies that achieve significantly higher task completion rates, with more efficient paths, while simultaneously learning to predict human visual attention. This research aims to highlight the importance of multimodal learning of visual attention information from additional human input modalities and encourages the community to adopt them when training agents from human demonstrations to perform visuomotor tasks.

Via

Access Paper or Ask Questions

PODNet: A Neural Network for Discovery of Plannable Options

Nov 15, 2019

Ritwik Bera, Vinicius G. Goecks, Gregory M. Gremillion, John Valasek, Nicholas R. Waytowich

Figure 1 for PODNet: A Neural Network for Discovery of Plannable Options

Figure 2 for PODNet: A Neural Network for Discovery of Plannable Options

Abstract:Learning from demonstration has been widely studied in machine learning but becomes challenging when the demonstrated trajectories are unstructured and follow different objectives. This short-paper proposes PODNet, Plannable Option Discovery Network, addressing how to segment an unstructured set of demonstrated trajectories for option discovery. This enables learning from demonstration to perform multiple tasks and plan high-level trajectories based on the discovered option labels. PODNet combines a custom categorical variational autoencoder, a recurrent option inference network, option-conditioned policy network, and option dynamics model in an end-to-end learning architecture. Due to the concurrently trained option-conditioned policy network and option dynamics model, the proposed architecture has implications in multi-task and hierarchical learning, explainable and interpretable artificial intelligence, and applications where the agent is required to learn only from observations.

Via

Access Paper or Ask Questions

Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Sparse Reward Environments

Oct 09, 2019

Vinicius G. Goecks, Gregory M. Gremillion, Vernon J. Lawhern, John Valasek, Nicholas R. Waytowich

Figure 1 for Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Sparse Reward Environments

Figure 2 for Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Sparse Reward Environments

Figure 3 for Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Sparse Reward Environments

Figure 4 for Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Sparse Reward Environments

Abstract:This paper investigates how to efficiently transition and update policies, trained initially with demonstrations, using off-policy actor-critic reinforcement learning. It is well-known that techniques based on Learning from Demonstrations, for example behavior cloning, can lead to proficient policies given limited data. However, it is currently unclear how to efficiently update that policy using reinforcement learning as these approaches are inherently optimizing different objective functions. Previous works have used loss functions which combine behavioral cloning losses with reinforcement learning losses to enable this update, however, the components of these loss functions are often set anecdotally, and their individual contributions are not well understood. In this work we propose the Cycle-of-Learning (CoL) framework that uses an actor-critic architecture with a loss function that combines behavior cloning and 1-step Q-learning losses with an off-policy pre-training step from human demonstrations. This enables transition from behavior cloning to reinforcement learning without performance degradation and improves reinforcement learning in terms of overall performance and training time. Additionally, we carefully study the composition of these combined losses and their impact on overall policy learning. We show that our approach outperforms state-of-the-art techniques for combining behavior cloning and reinforcement learning for both dense and sparse reward scenarios. Our results also suggest that directly including the behavior cloning loss on demonstration data helps to ensure stable learning and ground future policy updates.

* 9 pages, 3 Figures

Via

Access Paper or Ask Questions

Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time

Nov 28, 2018

Vinicius G. Goecks, Gregory M. Gremillion, Vernon J. Lawhern, John Valasek, Nicholas R. Waytowich

Figure 1 for Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time

Figure 2 for Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time

Figure 3 for Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time

Figure 4 for Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time

Abstract:This paper investigates how to utilize different forms of human interaction to safely train autonomous systems in real-time by learning from both human demonstrations and interventions. We implement two components of the Cycle-of-Learning for Autonomous Systems, which is our framework for combining multiple modalities of human interaction. The current effort employs human demonstrations to teach a desired behavior via imitation learning, then leverages intervention data to correct for undesired behaviors produced by the imitation learner to teach novel tasks to an autonomous agent safely, after only minutes of training. We demonstrate this method in an autonomous perching task using a quadrotor with continuous roll, pitch, yaw, and throttle commands and imagery captured from a downward-facing camera in a high-fidelity simulated environment. Our method improves task completion performance for the same amount of human interaction when compared to learning from demonstrations alone, while also requiring on average 32% less data to achieve that performance. This provides evidence that combining multiple modes of human interaction can increase both the training speed and overall performance of policies for autonomous systems.

* 9 pages, 6 figures

Via

Access Paper or Ask Questions