Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Austin Patel

GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization

Jul 20, 2024

Austin Patel, Shuran Song

Figure 1 for GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization

Figure 2 for GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization

Figure 3 for GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization

Figure 4 for GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization

Abstract:This paper introduces GET-Zero, a model architecture and training procedure for learning an embodiment-aware control policy that can immediately adapt to new hardware changes without retraining. To do so, we present Graph Embodiment Transformer (GET), a transformer model that leverages the embodiment graph connectivity as a learned structural bias in the attention mechanism. We use behavior cloning to distill demonstration data from embodiment-specific expert policies into an embodiment-aware GET model that conditions on the hardware configuration of the robot to make control decisions. We conduct a case study on a dexterous in-hand object rotation task using different configurations of a four-fingered robot hand with joints removed and with link length extensions. Using the GET model along with a self-modeling loss enables GET-Zero to zero-shot generalize to unseen variation in graph structure and link length, yielding a 20% improvement over baseline methods. All code and qualitative video results are on https://get-zero-paper.github.io

* 13 pages, 6 figures, 4 tables, website https://get-zero-paper.github.io

Via

Access Paper or Ask Questions

Learning to Imitate Object Interactions from Internet Videos

Nov 23, 2022

Austin Patel, Andrew Wang, Ilija Radosavovic, Jitendra Malik

Figure 1 for Learning to Imitate Object Interactions from Internet Videos

Figure 2 for Learning to Imitate Object Interactions from Internet Videos

Figure 3 for Learning to Imitate Object Interactions from Internet Videos

Figure 4 for Learning to Imitate Object Interactions from Internet Videos

Abstract:We study the problem of imitating object interactions from Internet videos. This requires understanding the hand-object interactions in 4D, spatially in 3D and over time, which is challenging due to mutual hand-object occlusions. In this paper we make two main contributions: (1) a novel reconstruction technique RHOV (Reconstructing Hands and Objects from Videos), which reconstructs 4D trajectories of both the hand and the object using 2D image cues and temporal smoothness constraints; (2) a system for imitating object interactions in a physics simulator with reinforcement learning. We apply our reconstruction technique to 100 challenging Internet videos. We further show that we can successfully imitate a range of different object interactions in a physics simulator. Our object-centric approach is not limited to human-like end-effectors and can learn to imitate object interactions using different embodiments, like a robotic arm with a parallel jaw gripper.

* Project page: https://austinapatel.github.io/imitate-video

Via

Access Paper or Ask Questions