Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

Oct 08, 2024

Chi-Lam Cheang, Guangzeng Chen, Ya Jing, Tao Kong, Hang Li, Yifeng Li, Yuxiao Liu, Hongtao Wu, Jiafeng Xu, Yichu Yang(+2 more)

Figure 1 for GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

Figure 2 for GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

Figure 3 for GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

Figure 4 for GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

Share this with someone who'll enjoy it:

Abstract:We present GR-2, a state-of-the-art generalist robot agent for versatile and generalizable robot manipulation. GR-2 is first pre-trained on a vast number of Internet videos to capture the dynamics of the world. This large-scale pre-training, involving 38 million video clips and over 50 billion tokens, equips GR-2 with the ability to generalize across a wide range of robotic tasks and environments during subsequent policy learning. Following this, GR-2 is fine-tuned for both video generation and action prediction using robot trajectories. It exhibits impressive multi-task learning capabilities, achieving an average success rate of 97.7% across more than 100 tasks. Moreover, GR-2 demonstrates exceptional generalization to new, previously unseen scenarios, including novel backgrounds, environments, objects, and tasks. Notably, GR-2 scales effectively with model size, underscoring its potential for continued growth and application. Project page: \url{https://gr2-manipulation.github.io}.

* Tech Report. Authors are listed in alphabetical order. Project page: https://gr2-manipulation.github.io

View paper on

Share this with someone who'll enjoy it:

Title:GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

Paper and Code