Picture for Shaofei Cai

Shaofei Cai

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Add code
Jul 02, 2025
Viaarxiv icon

Toward Memory-Aided World Models: Benchmarking via Spatial Consistency

Add code
May 29, 2025
Viaarxiv icon

ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment

Add code
Mar 04, 2025
Viaarxiv icon

MineStudio: A Streamlined Package for Minecraft AI Agent Development

Add code
Dec 25, 2024
Viaarxiv icon

MinsStudio: A Streamlined Package for Minecraft AI Agent Development

Add code
Dec 24, 2024
Viaarxiv icon

Optimizing Latent Goal by Learning from Trajectory Preference

Add code
Dec 03, 2024
Viaarxiv icon

ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Add code
Oct 23, 2024
Viaarxiv icon

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Add code
Jun 27, 2024
Figure 1 for OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
Figure 2 for OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
Figure 3 for OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
Figure 4 for OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
Viaarxiv icon

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

Add code
Nov 30, 2023
Viaarxiv icon

GROOT: Learning to Follow Instructions by Watching Gameplay Videos

Add code
Oct 12, 2023
Viaarxiv icon