Picture for Yuhao Dong

Yuhao Dong

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

Add code
Mar 18, 2026
Viaarxiv icon

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Add code
Mar 16, 2026
Viaarxiv icon

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Add code
Feb 09, 2026
Viaarxiv icon

Kimi K2.5: Visual Agentic Intelligence

Add code
Feb 02, 2026
Viaarxiv icon

The RoboSense Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms

Add code
Jan 08, 2026
Viaarxiv icon

Visual Grounding from Event Cameras

Add code
Sep 11, 2025
Viaarxiv icon

Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras

Add code
Jul 23, 2025
Viaarxiv icon

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Add code
Jun 26, 2025
Viaarxiv icon

Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning

Add code
Jun 16, 2025
Viaarxiv icon

EgoLife: Towards Egocentric Life Assistant

Add code
Mar 05, 2025
Viaarxiv icon