Picture for Rui Shao

Rui Shao

Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems

Add code
Jun 15, 2025
Viaarxiv icon

Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts

Add code
Jun 12, 2025
Viaarxiv icon

Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills

Add code
Jun 12, 2025
Viaarxiv icon

STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization

Add code
Jun 04, 2025
Viaarxiv icon

GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent

Add code
May 22, 2025
Viaarxiv icon

DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer

Add code
Apr 28, 2025
Viaarxiv icon

TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs

Add code
Mar 13, 2025
Viaarxiv icon

Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation

Add code
Mar 13, 2025
Viaarxiv icon

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

Add code
Mar 05, 2025
Viaarxiv icon

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Add code
Feb 27, 2025
Viaarxiv icon