Picture for Shanghang Zhang

Shanghang Zhang

UV-M3TL: A Unified and Versatile Multimodal Multi-Task Learning Framework for Assistive Driving Perception

Add code
Feb 02, 2026
Viaarxiv icon

Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models

Add code
Feb 01, 2026
Viaarxiv icon

EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots

Add code
Jan 29, 2026
Viaarxiv icon

TC-IDM: Grounding Video Generation for Executable Zero-shot Robot Motion

Add code
Jan 26, 2026
Viaarxiv icon

PhysicsMind: Sim and Real Mechanics Benchmarking for Physical Reasoning and Prediction in Foundational VLMs and World Models

Add code
Jan 22, 2026
Viaarxiv icon

RoboBrain 2.5: Depth in Sight, Time in Mind

Add code
Jan 20, 2026
Viaarxiv icon

Wow, wo, val! A Comprehensive Embodied World Model Evaluation Turing Test

Add code
Jan 07, 2026
Viaarxiv icon

Action-Sketcher: From Reasoning to Action via Visual Sketches for Long-Horizon Robotic Manipulation

Add code
Jan 04, 2026
Viaarxiv icon

RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence

Add code
Dec 31, 2025
Viaarxiv icon

RoboMirror: Understand Before You Imitate for Video to Humanoid Locomotion

Add code
Dec 30, 2025
Viaarxiv icon