Picture for Danny Driess

Danny Driess

Vision Language Models are In-Context Value Learners

Add code
Nov 07, 2024
Viaarxiv icon

RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation

Add code
Nov 05, 2024
Viaarxiv icon

$π_0$: A Vision-Language-Action Flow Model for General Robot Control

Add code
Oct 31, 2024
Viaarxiv icon

ALOHA Unleashed: A Simple Recipe for Robot Dexterity

Add code
Oct 17, 2024
Viaarxiv icon

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

Add code
Mar 19, 2024
Figure 1 for Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Figure 2 for Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Figure 3 for Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Figure 4 for Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Viaarxiv icon

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

Add code
Feb 12, 2024
Figure 1 for PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Figure 2 for PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Figure 3 for PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Figure 4 for PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Viaarxiv icon

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Add code
Jan 22, 2024
Viaarxiv icon

Foundation Models in Robotics: Applications, Challenges, and the Future

Add code
Dec 13, 2023
Viaarxiv icon

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Add code
Oct 17, 2023
Figure 1 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Figure 2 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Figure 3 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Figure 4 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Viaarxiv icon

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Add code
Jul 28, 2023
Viaarxiv icon