Picture for Shanghang Zhang

Shanghang Zhang

Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning

Add code
Mar 27, 2025
Viaarxiv icon

MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation

Add code
Mar 26, 2025
Viaarxiv icon

EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?

Add code
Mar 19, 2025
Viaarxiv icon

HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model

Add code
Mar 13, 2025
Viaarxiv icon

RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete

Add code
Feb 28, 2025
Viaarxiv icon

MapNav: A Novel Memory Representation via Annotated Semantic Maps for VLM-based Vision-and-Language Navigation

Add code
Feb 19, 2025
Viaarxiv icon

CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World

Add code
Feb 12, 2025
Viaarxiv icon

LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information

Add code
Feb 04, 2025
Figure 1 for LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information
Figure 2 for LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information
Figure 3 for LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information
Figure 4 for LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information
Viaarxiv icon

SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation

Add code
Jan 28, 2025
Figure 1 for SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation
Figure 2 for SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation
Figure 3 for SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation
Figure 4 for SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation
Viaarxiv icon

MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders

Add code
Jan 03, 2025
Figure 1 for MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
Figure 2 for MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
Figure 3 for MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
Figure 4 for MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
Viaarxiv icon