Picture for Haoyu Cao

Haoyu Cao

RISE-Video: Can Video Generators Decode Implicit World Rules?

Add code
Feb 05, 2026
Viaarxiv icon

Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding

Add code
Jan 28, 2026
Viaarxiv icon

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Add code
Jan 27, 2026
Viaarxiv icon

TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning

Add code
Jan 23, 2026
Viaarxiv icon

DiG: Differential Grounding for Enhancing Fine-Grained Perception in Multimodal Large Language Model

Add code
Dec 14, 2025
Viaarxiv icon

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation

Add code
Oct 10, 2025
Viaarxiv icon

CROP: Integrating Topological and Spatial Structures via Cross-View Prefixes for Molecular LLMs

Add code
Aug 09, 2025
Viaarxiv icon

BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models

Add code
Aug 09, 2025
Viaarxiv icon

TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs

Add code
May 27, 2025
Viaarxiv icon

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Add code
May 06, 2025
Viaarxiv icon