Picture for Di Hu

Di Hu

Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception

Add code
Apr 09, 2025
Viaarxiv icon

Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition

Add code
Mar 24, 2025
Viaarxiv icon

Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

Add code
Mar 17, 2025
Viaarxiv icon

Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning

Add code
Dec 12, 2024
Viaarxiv icon

On-the-fly Modulation for Balanced Multimodal Learning

Add code
Oct 15, 2024
Viaarxiv icon

Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection

Add code
Aug 09, 2024
Figure 1 for Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection
Figure 2 for Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection
Figure 3 for Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection
Figure 4 for Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection
Viaarxiv icon

KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance

Add code
Aug 06, 2024
Viaarxiv icon

Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation

Add code
Aug 02, 2024
Viaarxiv icon

Boosting Audio Visual Question Answering via Key Semantic-Aware Cues

Add code
Jul 30, 2024
Figure 1 for Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
Figure 2 for Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
Figure 3 for Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
Figure 4 for Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
Viaarxiv icon

Towards Effective and Efficient Continual Pre-training of Large Language Models

Add code
Jul 26, 2024
Figure 1 for Towards Effective and Efficient Continual Pre-training of Large Language Models
Figure 2 for Towards Effective and Efficient Continual Pre-training of Large Language Models
Figure 3 for Towards Effective and Efficient Continual Pre-training of Large Language Models
Figure 4 for Towards Effective and Efficient Continual Pre-training of Large Language Models
Viaarxiv icon