Picture for Jiafei Duan

Jiafei Duan

SAT: Spatial Aptitude Training for Multimodal Language Models

Add code
Dec 10, 2024
Viaarxiv icon

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

Add code
Oct 01, 2024
Viaarxiv icon

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Add code
Jun 27, 2024
Viaarxiv icon

RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics

Add code
Jun 15, 2024
Viaarxiv icon

Octopi: Object Property Reasoning with Large Tactile-Language Models

Add code
May 05, 2024
Viaarxiv icon

EVE: Enabling Anyone to Train Robot using Augmented Reality

Add code
Apr 09, 2024
Viaarxiv icon

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

Add code
Feb 13, 2024
Viaarxiv icon

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

Add code
Nov 07, 2023
Viaarxiv icon

NEWTON: Are Large Language Models Capable of Physical Reasoning?

Add code
Oct 10, 2023
Figure 1 for NEWTON: Are Large Language Models Capable of Physical Reasoning?
Figure 2 for NEWTON: Are Large Language Models Capable of Physical Reasoning?
Figure 3 for NEWTON: Are Large Language Models Capable of Physical Reasoning?
Figure 4 for NEWTON: Are Large Language Models Capable of Physical Reasoning?
Viaarxiv icon

AR2-D2:Training a Robot Without a Robot

Add code
Jun 23, 2023
Viaarxiv icon