Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nolan Wagener

H-GAP: Humanoid Control with a Generalist Planner

Dec 05, 2023

Zhengyao Jiang, Yingchen Xu, Nolan Wagener, Yicheng Luo, Michael Janner, Edward Grefenstette, Tim Rocktäschel, Yuandong Tian

Figure 1 for H-GAP: Humanoid Control with a Generalist Planner

Figure 2 for H-GAP: Humanoid Control with a Generalist Planner

Figure 3 for H-GAP: Humanoid Control with a Generalist Planner

Figure 4 for H-GAP: Humanoid Control with a Generalist Planner

Abstract:Humanoid control is an important research challenge offering avenues for integration into human-centric infrastructures and enabling physics-driven humanoid animations. The daunting challenges in this field stem from the difficulty of optimizing in high-dimensional action spaces and the instability introduced by the bipedal morphology of humanoids. However, the extensive collection of human motion-captured data and the derived datasets of humanoid trajectories, such as MoCapAct, paves the way to tackle these challenges. In this context, we present Humanoid Generalist Autoencoding Planner (H-GAP), a state-action trajectory generative model trained on humanoid trajectories derived from human motion-captured data, capable of adeptly handling downstream control tasks with Model Predictive Control (MPC). For 56 degrees of freedom humanoid, we empirically demonstrate that H-GAP learns to represent and generate a wide range of motor behaviours. Further, without any learning from online interactions, it can also flexibly transfer these behaviors to solve novel downstream control tasks via planning. Notably, H-GAP excels established MPC baselines that have access to the ground truth dynamics model, and is superior or comparable to offline RL methods trained for individual tasks. Finally, we do a series of empirical studies on the scaling properties of H-GAP, showing the potential for performance gains via additional data but not computing. Code and videos are available at https://ycxuyingchen.github.io/hgap/.

* 18 pages including appendix, 4 figures

Via

Access Paper or Ask Questions

TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Mar 28, 2023

Xiangyun Meng, Nathan Hatch, Alexander Lambert, Anqi Li, Nolan Wagener, Matthew Schmittle, JoonHo Lee, Wentao Yuan, Zoey Chen, Samuel Deng(+4 more)

Figure 1 for TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Figure 2 for TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Figure 3 for TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Figure 4 for TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Abstract:Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for semantic and geometric terrain prediction for aggressive, off-road navigation. The approach relies on several key insights and practical considerations for achieving reliable terrain modeling. The network includes a multi-headed output representation to capture fine- and coarse-grained terrain features necessary for estimating traversability. Accurate depth estimation is achieved using self-supervised depth completion with multi-view RGB and stereo inputs. Requirements for real-time performance and fast inference speeds are met using efficient, learned image feature projections. Furthermore, the model is trained on a large-scale, real-world off-road dataset collected across a variety of diverse outdoor environments. We show how TerrainNet can also be used for costmap prediction and provide a detailed framework for integration into a planning module. We demonstrate the performance of TerrainNet through extensive comparison to current state-of-the-art baselines for camera-only scene prediction. Finally, we showcase the effectiveness of integrating TerrainNet within a complete autonomous-driving stack by conducting a real-world vehicle test in a challenging off-road scenario.

Via

Access Paper or Ask Questions

MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control

Aug 15, 2022

Nolan Wagener, Andrey Kolobov, Felipe Vieira Frujeri, Ricky Loynd, Ching-An Cheng, Matthew Hausknecht

Figure 1 for MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control

Figure 2 for MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control

Figure 3 for MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control

Figure 4 for MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control

Abstract:Simulated humanoids are an appealing research domain due to their physical capabilities. Nonetheless, they are also challenging to control, as a policy must drive an unstable, discontinuous, and high-dimensional physical system. One widely studied approach is to utilize motion capture (MoCap) data to teach the humanoid agent low-level skills (e.g., standing, walking, and running) that can then be re-used to synthesize high-level behaviors. However, even with MoCap data, controlling simulated humanoids remains very hard, as MoCap data offers only kinematic information. Finding physical control inputs to realize the demonstrated motions requires computationally intensive methods like reinforcement learning. Thus, despite the publicly available MoCap data, its utility has been limited to institutions with large-scale compute. In this work, we dramatically lower the barrier for productive research on this topic by training and releasing high-quality agents that can track over three hours of MoCap data for a simulated humanoid in the dm_control physics-based environment. We release MoCapAct (Motion Capture with Actions), a dataset of these expert agents and their rollouts, which contain proprioceptive observations and actions. We demonstrate the utility of MoCapAct by using it to train a single hierarchical policy capable of tracking the entire MoCap dataset within dm_control and show the learned low-level component can be re-used to efficiently learn downstream high-level tasks. Finally, we use MoCapAct to train an autoregressive GPT model and show that it can control a simulated humanoid to perform natural motion completion given a motion prompt. Videos of the results and links to the code and dataset are available at https://microsoft.github.io/MoCapAct.

Via

Access Paper or Ask Questions

Consistent Dropout for Policy Gradient Reinforcement Learning

Feb 23, 2022

Matthew Hausknecht, Nolan Wagener

Figure 1 for Consistent Dropout for Policy Gradient Reinforcement Learning

Figure 2 for Consistent Dropout for Policy Gradient Reinforcement Learning

Figure 3 for Consistent Dropout for Policy Gradient Reinforcement Learning

Figure 4 for Consistent Dropout for Policy Gradient Reinforcement Learning

Abstract:Dropout has long been a staple of supervised learning, but is rarely used in reinforcement learning. We analyze why naive application of dropout is problematic for policy-gradient learning algorithms and introduce consistent dropout, a simple technique to address this instability. We demonstrate consistent dropout enables stable training with A2C and PPO in both continuous and discrete action environments across a wide range of dropout probabilities. Finally, we show that consistent dropout enables the online training of complex architectures such as GPT without needing to disable the model's native dropout.

Via

Access Paper or Ask Questions

Safe Reinforcement Learning Using Advantage-Based Intervention

Jul 19, 2021

Nolan Wagener, Byron Boots, Ching-An Cheng

Figure 1 for Safe Reinforcement Learning Using Advantage-Based Intervention

Figure 2 for Safe Reinforcement Learning Using Advantage-Based Intervention

Figure 3 for Safe Reinforcement Learning Using Advantage-Based Intervention

Figure 4 for Safe Reinforcement Learning Using Advantage-Based Intervention

Abstract:Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still satisfying constraints in an unknown Markov decision process (MDP). In this work, we address this problem for the chance-constrained setting. We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs. Our method comes with strong guarantees on safety during both training and deployment (i.e., after training and without the intervention mechanism) and policy performance compared to the optimal safety-constrained policy. In our experiments, we show that SAILR violates constraints far less during training than standard safe RL and constrained MDP approaches and converges to a well-performing policy that can be deployed safely without intervention. Our code is available at https://github.com/nolanwagener/safe_rl.

* Appearing in ICML 2021. 29 pages, 8 figures

Via

Access Paper or Ask Questions

An Online Learning Approach to Model Predictive Control

Feb 24, 2019

Nolan Wagener, Ching-An Cheng, Jacob Sacks, Byron Boots

Figure 1 for An Online Learning Approach to Model Predictive Control

Figure 2 for An Online Learning Approach to Model Predictive Control

Figure 3 for An Online Learning Approach to Model Predictive Control

Figure 4 for An Online Learning Approach to Model Predictive Control

Abstract:Model predictive control (MPC) is a powerful technique for solving dynamic control tasks. In this paper, we show that there exists a close connection between MPC and online learning, an abstract theoretical framework for analyzing online decision making in the optimization literature. This new perspective provides a foundation for leveraging powerful online learning algorithms to design MPC algorithms. Specifically, we propose a new algorithm based on dynamic mirror descent (DMD), an online learning algorithm that is designed for non-stationary setups. Our algorithm, Dynamic Mirror Decent Model Predictive Control (DMD-MPC), represents a general family of MPC algorithms that includes many existing techniques as special instances. DMD-MPC also provides a fresh perspective on previous heuristics used in MPC and suggests a principled way to design new MPC algorithms. In the experimental section of this paper, we demonstrate the flexibility of DMD-MPC, presenting a set of new MPC algorithms on a simple simulated cartpole and a simulated and real-world aggressive driving task.

Via

Access Paper or Ask Questions

Fast Policy Learning through Imitation and Reinforcement

May 26, 2018

Ching-An Cheng, Xinyan Yan, Nolan Wagener, Byron Boots

Figure 1 for Fast Policy Learning through Imitation and Reinforcement

Abstract:Imitation learning (IL) consists of a set of tools that leverage expert demonstrations to quickly learn policies. However, if the expert is suboptimal, IL can yield policies with inferior performance compared to reinforcement learning (RL). In this paper, we aim to provide an algorithm that combines the best aspects of RL and IL. We accomplish this by formulating several popular RL and IL algorithms in a common mirror descent framework, showing that these algorithms can be viewed as a variation on a single approach. We then propose LOKI, a strategy for policy learning that first performs a small but random number of IL iterations before switching to a policy gradient RL method. We show that if the switching time is properly randomized, LOKI can learn to outperform a suboptimal expert and converge faster than running policy gradient from scratch. Finally, we evaluate the performance of LOKI experimentally in several simulated environments.

Via

Access Paper or Ask Questions

Learning Contact-Rich Manipulation Skills with Guided Policy Search

Feb 26, 2015

Sergey Levine, Nolan Wagener, Pieter Abbeel

Figure 1 for Learning Contact-Rich Manipulation Skills with Guided Policy Search

Figure 2 for Learning Contact-Rich Manipulation Skills with Guided Policy Search

Figure 3 for Learning Contact-Rich Manipulation Skills with Guided Policy Search

Figure 4 for Learning Contact-Rich Manipulation Skills with Guided Policy Search

Abstract:Autonomous learning of object manipulation skills can enable robots to acquire rich behavioral repertoires that scale to the variety of objects found in the real world. However, current motion skill learning methods typically restrict the behavior to a compact, low-dimensional representation, limiting its expressiveness and generality. In this paper, we extend a recently developed policy search method \cite{la-lnnpg-14} and use it to learn a range of dynamic manipulation behaviors with highly general policy representations, without using known models or example demonstrations. Our approach learns a set of trajectories for the desired motion skill by using iteratively refitted time-varying linear models, and then unifies these trajectories into a single control policy that can generalize to new situations. To enable this method to run on a real robot, we introduce several improvements that reduce the sample count and automate parameter selection. We show that our method can acquire fast, fluent behaviors after only minutes of interaction time, and can learn robust controllers for complex tasks, including putting together a toy airplane, stacking tight-fitting lego blocks, placing wooden rings onto tight-fitting pegs, inserting a shoe tree into a shoe, and screwing bottle caps onto bottles.

* S. Levine, N. Wagener, P. Abbeel, "Learning Contact-Rich Manipulation Skills with Guided Policy Search," in International Conference on Robotics and Automation (ICRA), 2015

Via

Access Paper or Ask Questions