Picture for Shanghang Zhang

Shanghang Zhang

From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance

Add code
Oct 16, 2025
Viaarxiv icon

WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation

Add code
Oct 08, 2025
Viaarxiv icon

TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics

Add code
Oct 08, 2025
Viaarxiv icon

MathSticks: A Benchmark for Visual Symbolic Compositional Reasoning with Matchstick Puzzles

Add code
Oct 01, 2025
Viaarxiv icon

Can World Models Benefit VLMs for World Dynamics?

Add code
Oct 01, 2025
Viaarxiv icon

MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation

Add code
Sep 30, 2025
Figure 1 for MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Figure 2 for MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Figure 3 for MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Figure 4 for MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Viaarxiv icon

WoW: Towards a World omniscient World model Through Embodied Interaction

Add code
Sep 26, 2025
Viaarxiv icon

BEVUDA++: Geometric-aware Unsupervised Domain Adaptation for Multi-View 3D Object Detection

Add code
Sep 17, 2025
Viaarxiv icon

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Add code
Sep 11, 2025
Viaarxiv icon

MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs

Add code
Aug 28, 2025
Viaarxiv icon