Picture for Di Hu

Di Hu

Crab$^{+}$: A Scalable and Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

Add code
Mar 04, 2026
Viaarxiv icon

APPO: Attention-guided Perception Policy Optimization for Video Reasoning

Add code
Mar 03, 2026
Viaarxiv icon

GeCo-SRT: Geometry-aware Continual Adaptation for Robotic Cross-Task Sim-to-Real Transfer

Add code
Feb 25, 2026
Viaarxiv icon

When would Vision-Proprioception Policies Fail in Robotic Manipulation?

Add code
Feb 12, 2026
Viaarxiv icon

AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception

Add code
Feb 10, 2026
Viaarxiv icon

Video Detective: Seek Critical Clues Recurrently to Answer Question from Long Videos

Add code
Dec 19, 2025
Viaarxiv icon

Understanding Stigmatizing Language Lexicons: A Comparative Analysis in Clinical Contexts

Add code
Sep 09, 2025
Viaarxiv icon

Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI

Add code
Jun 24, 2025
Figure 1 for Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI
Figure 2 for Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI
Viaarxiv icon

RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer

Add code
Jun 13, 2025
Viaarxiv icon

Robotic Policy Learning via Human-assisted Action Preference Optimization

Add code
Jun 08, 2025
Figure 1 for Robotic Policy Learning via Human-assisted Action Preference Optimization
Figure 2 for Robotic Policy Learning via Human-assisted Action Preference Optimization
Figure 3 for Robotic Policy Learning via Human-assisted Action Preference Optimization
Figure 4 for Robotic Policy Learning via Human-assisted Action Preference Optimization
Viaarxiv icon