Picture for Haochen Wang

Haochen Wang

SAMTok: Representing Any Mask with Two Words

Add code
Jan 22, 2026
Viaarxiv icon

MMFormalizer: Multimodal Autoformalization in the Wild

Add code
Jan 06, 2026
Viaarxiv icon

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Add code
Nov 18, 2025
Viaarxiv icon

CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models

Add code
Nov 15, 2025
Viaarxiv icon

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

Add code
Nov 13, 2025
Figure 1 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Figure 2 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Figure 3 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Figure 4 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Viaarxiv icon

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Add code
Oct 23, 2025
Viaarxiv icon

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Add code
Oct 22, 2025
Viaarxiv icon

DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving

Add code
Oct 14, 2025
Figure 1 for DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Figure 2 for DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Figure 3 for DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Figure 4 for DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Viaarxiv icon

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Add code
Jul 10, 2025
Figure 1 for Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Figure 2 for Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Figure 3 for Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Figure 4 for Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Viaarxiv icon

Holistic Tokenizer for Autoregressive Image Generation

Add code
Jul 03, 2025
Viaarxiv icon