Picture for Zhiheng Ma

Zhiheng Ma

Trajectory-Diversity-Driven Robust Vision-and-Language Navigation

Add code
Mar 16, 2026
Viaarxiv icon

Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models

Add code
Mar 02, 2026
Viaarxiv icon

ReMoT: Reinforcement Learning with Motion Contrast Triplets

Add code
Feb 28, 2026
Viaarxiv icon

ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning

Add code
Feb 11, 2026
Viaarxiv icon

P2L-CA: An Effective Parameter Tuning Framework for Rehearsal-Free Multi-Label Class-Incremental Learning

Add code
Jan 19, 2026
Viaarxiv icon

DD-Ranking: Rethinking the Evaluation of Dataset Distillation

Add code
May 19, 2025
Figure 1 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Figure 2 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Figure 3 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Figure 4 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Viaarxiv icon

CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization

Add code
May 06, 2025
Viaarxiv icon

ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing

Add code
Dec 17, 2024
Figure 1 for ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing
Figure 2 for ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing
Figure 3 for ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing
Figure 4 for ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing
Viaarxiv icon

FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training

Add code
Nov 18, 2024
Viaarxiv icon

Multi-modal Crowd Counting via Modal Emulation

Add code
Jul 28, 2024
Figure 1 for Multi-modal Crowd Counting via Modal Emulation
Figure 2 for Multi-modal Crowd Counting via Modal Emulation
Figure 3 for Multi-modal Crowd Counting via Modal Emulation
Figure 4 for Multi-modal Crowd Counting via Modal Emulation
Viaarxiv icon