Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yide Shentu

RoboCopilot: Human-in-the-loop Interactive Imitation Learning for Robot Manipulation

Mar 10, 2025

Philipp Wu, Yide Shentu, Qiayuan Liao, Ding Jin, Menglong Guo, Koushil Sreenath, Xingyu Lin, Pieter Abbeel

Abstract:Learning from human demonstration is an effective approach for learning complex manipulation skills. However, existing approaches heavily focus on learning from passive human demonstration data for its simplicity in data collection. Interactive human teaching has appealing theoretical and practical properties, but they are not well supported by existing human-robot interfaces. This paper proposes a novel system that enables seamless control switching between human and an autonomous policy for bi-manual manipulation tasks, enabling more efficient learning of new tasks. This is achieved through a compliant, bilateral teleoperation system. Through simulation and hardware experiments, we demonstrate the value of our system in an interactive human teaching for learning complex bi-manual manipulation skills.

Via

Access Paper or Ask Questions

From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

May 08, 2024

Yide Shentu, Philipp Wu, Aravind Rajeswaran, Pieter Abbeel

Figure 1 for From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

Figure 2 for From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

Figure 3 for From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

Figure 4 for From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

Abstract:Hierarchical control for robotics has long been plagued by the need to have a well defined interface layer to communicate between high-level task planners and low-level policies. With the advent of LLMs, language has been emerging as a prospective interface layer. However, this has several limitations. Not all tasks can be decomposed into steps that are easily expressible in natural language (e.g. performing a dance routine). Further, it makes end-to-end finetuning on embodied data challenging due to domain shift and catastrophic forgetting. We introduce our method -- Learnable Latent Codes as Bridges (LCB) -- as an alternate architecture to overcome these limitations. \method~uses a learnable latent code to act as a bridge between LLMs and low-level policies. This enables LLMs to flexibly communicate goals in the task plan without being entirely constrained by language limitations. Additionally, it enables end-to-end finetuning without destroying the embedding space of word tokens learned during pre-training. Through experiments on Language Table and Calvin, two common language based benchmarks for embodied agents, we find that \method~outperforms baselines (including those w/ GPT-4V) that leverage pure language as the interface layer on tasks that require reasoning and multi-step behaviors.

Via

Access Paper or Ask Questions

GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators

Sep 22, 2023

Philipp Wu, Yide Shentu, Zhongke Yi, Xingyu Lin, Pieter Abbeel

Figure 1 for GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators

Figure 2 for GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators

Figure 3 for GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators

Figure 4 for GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators

Abstract:Imitation learning from human demonstrations is a powerful framework to teach robots new skills. However, the performance of the learned policies is bottlenecked by the quality, scale, and variety of the demonstration data. In this paper, we aim to lower the barrier to collecting large and high-quality human demonstration data by proposing GELLO, a general framework for building low-cost and intuitive teleoperation systems for robotic manipulation. Given a target robot arm, we build a GELLO controller that has the same kinematic structure as the target arm, leveraging 3D-printed parts and off-the-shelf motors. GELLO is easy to build and intuitive to use. Through an extensive user study, we show that GELLO enables more reliable and efficient demonstration collection compared to commonly used teleoperation devices in the imitation learning literature such as VR controllers and 3D spacemouses. We further demonstrate the capabilities of GELLO for performing complex bi-manual and contact-rich manipulation tasks. To make GELLO accessible to everyone, we have designed and built GELLO systems for 3 commonly used robotic arms: Franka, UR5, and xArm. All software and hardware are open-sourced and can be found on our website: https://wuphilipp.github.io/gello/.

Via

Access Paper or Ask Questions

Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction

Oct 13, 2022

YuXuan Liu, Nikhil Mishra, Maximilian Sieb, Yide Shentu, Pieter Abbeel, Xi Chen

Figure 1 for Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction

Figure 2 for Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction

Figure 3 for Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction

Figure 4 for Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction

Abstract:3D bounding boxes are a widespread intermediate representation in many computer vision applications. However, predicting them is a challenging task, largely due to partial observability, which motivates the need for a strong sense of uncertainty. While many recent methods have explored better architectures for consuming sparse and unstructured point cloud data, we hypothesize that there is room for improvement in the modeling of the output distribution and explore how this can be achieved using an autoregressive prediction head. Additionally, we release a simulated dataset, COB-3D, which highlights new types of ambiguity that arise in real-world robotics applications, where 3D bounding box prediction has largely been underexplored. We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures, achieving strong results on SUN-RGBD, Scannet, KITTI, and our new dataset.

* In ECCV 2022. Code and dataset are available at https://bbox.yuxuanliu.com

Via

Access Paper or Ask Questions

Learning Instance Segmentation by Interaction

Jun 21, 2018

Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik

Figure 1 for Learning Instance Segmentation by Interaction

Figure 2 for Learning Instance Segmentation by Interaction

Figure 3 for Learning Instance Segmentation by Interaction

Abstract:We present an approach for building an active agent that learns to segment its visual observations into individual objects by interacting with its environment in a completely self-supervised manner. The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels. The model learned from over 50K interactions generalizes to novel objects and backgrounds. To deal with noisy training signal for segmenting objects obtained by self-supervised interactions, we propose robust set loss. A dataset of robot's interactions along-with a few human labeled examples is provided as a benchmark for future research. We test the utility of the learned segmentation model by providing results on a downstream vision-based control task of rearranging multiple objects into target configurations from visual inputs alone. Videos, code, and robotic interaction dataset are available at https://pathak22.github.io/seg-by-interaction/

* Website at https://pathak22.github.io/seg-by-interaction/

Via

Access Paper or Ask Questions

Zero-Shot Visual Imitation

Apr 23, 2018

Deepak Pathak, Parsa Mahmoudieh, Guanghao Luo, Pulkit Agrawal, Dian Chen, Yide Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, Trevor Darrell

Abstract:The current dominant paradigm for imitation learning relies on strong supervision of expert actions to learn both 'what' and 'how' to imitate. We pursue an alternative paradigm wherein an agent first explores the world without any expert supervision and then distills its experience into a goal-conditioned skill policy with a novel forward consistency loss. In our framework, the role of the expert is only to communicate the goals (i.e., what to imitate) during inference. The learned policy is then employed to mimic the expert (i.e., how to imitate) after seeing just a sequence of images demonstrating the desired task. Our method is 'zero-shot' in the sense that the agent never has access to expert actions during training or for the task demonstration at inference. We evaluate our zero-shot imitator in two real-world settings: complex rope manipulation with a Baxter robot and navigation in previously unseen office environments with a TurtleBot. Through further experiments in VizDoom simulation, we provide evidence that better mechanisms for exploration lead to learning a more capable policy which in turn improves end task performance. Videos, models, and more details are available at https://pathak22.github.io/zeroshot-imitation/

* Oral presentation at ICLR 2018. Website at https://pathak22.github.io/zeroshot-imitation/

Via

Access Paper or Ask Questions