Picture for Tao Kong

Tao Kong

GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

Add code
Oct 08, 2024
Figure 1 for GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
Figure 2 for GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
Figure 3 for GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
Figure 4 for GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
Viaarxiv icon

World Model-based Perception for Visual Legged Locomotion

Add code
Sep 25, 2024
Viaarxiv icon

GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy

Add code
Aug 26, 2024
Viaarxiv icon

IRASim: Learning Interactive Real-Robot Action Simulators

Add code
Jun 20, 2024
Viaarxiv icon

SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction

Add code
Feb 20, 2024
Viaarxiv icon

Towards Unified Interactive Visual Grounding in The Wild

Add code
Jan 30, 2024
Figure 1 for Towards Unified Interactive Visual Grounding in The Wild
Figure 2 for Towards Unified Interactive Visual Grounding in The Wild
Figure 3 for Towards Unified Interactive Visual Grounding in The Wild
Figure 4 for Towards Unified Interactive Visual Grounding in The Wild
Viaarxiv icon

Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation

Add code
Dec 21, 2023
Viaarxiv icon

Vision-Language Foundation Models as Effective Robot Imitators

Add code
Nov 06, 2023
Figure 1 for Vision-Language Foundation Models as Effective Robot Imitators
Figure 2 for Vision-Language Foundation Models as Effective Robot Imitators
Figure 3 for Vision-Language Foundation Models as Effective Robot Imitators
Figure 4 for Vision-Language Foundation Models as Effective Robot Imitators
Viaarxiv icon

InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions

Add code
Oct 18, 2023
Figure 1 for InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Figure 2 for InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Figure 3 for InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Figure 4 for InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Viaarxiv icon

MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation

Add code
Aug 07, 2023
Figure 1 for MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation
Figure 2 for MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation
Figure 3 for MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation
Figure 4 for MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation
Viaarxiv icon