Picture for Di Hu

Di Hu

Video Detective: Seek Critical Clues Recurrently to Answer Question from Long Videos

Add code
Dec 19, 2025
Viaarxiv icon

Understanding Stigmatizing Language Lexicons: A Comparative Analysis in Clinical Contexts

Add code
Sep 09, 2025
Viaarxiv icon

Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI

Add code
Jun 24, 2025
Figure 1 for Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI
Figure 2 for Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI
Viaarxiv icon

RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer

Add code
Jun 13, 2025
Viaarxiv icon

Robotic Policy Learning via Human-assisted Action Preference Optimization

Add code
Jun 08, 2025
Figure 1 for Robotic Policy Learning via Human-assisted Action Preference Optimization
Figure 2 for Robotic Policy Learning via Human-assisted Action Preference Optimization
Figure 3 for Robotic Policy Learning via Human-assisted Action Preference Optimization
Figure 4 for Robotic Policy Learning via Human-assisted Action Preference Optimization
Viaarxiv icon

MokA: Multimodal Low-Rank Adaptation for MLLMs

Add code
Jun 05, 2025
Figure 1 for MokA: Multimodal Low-Rank Adaptation for MLLMs
Figure 2 for MokA: Multimodal Low-Rank Adaptation for MLLMs
Figure 3 for MokA: Multimodal Low-Rank Adaptation for MLLMs
Figure 4 for MokA: Multimodal Low-Rank Adaptation for MLLMs
Viaarxiv icon

Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction

Add code
Apr 20, 2025
Figure 1 for Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction
Figure 2 for Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction
Figure 3 for Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction
Figure 4 for Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction
Viaarxiv icon

Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception

Add code
Apr 09, 2025
Figure 1 for Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Figure 2 for Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Figure 3 for Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Figure 4 for Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Viaarxiv icon

Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition

Add code
Mar 24, 2025
Viaarxiv icon

Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

Add code
Mar 17, 2025
Viaarxiv icon