Picture for Zhaoxiang Zhang

Zhaoxiang Zhang

DynVLA: Learning World Dynamics for Action Reasoning in Autonomous Driving

Add code
Mar 11, 2026
Viaarxiv icon

GA-Drive: Geometry-Appearance Decoupled Modeling for Free-viewpoint Driving Scene Generatio

Add code
Feb 24, 2026
Viaarxiv icon

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

Add code
Feb 11, 2026
Viaarxiv icon

WorldArena: A Unified Benchmark for Evaluating Perception and Functional Utility of Embodied World Models

Add code
Feb 09, 2026
Viaarxiv icon

NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos

Add code
Jan 01, 2026
Viaarxiv icon

Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements

Add code
Dec 31, 2025
Viaarxiv icon

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

Add code
Dec 24, 2025
Viaarxiv icon

VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?

Add code
Dec 23, 2025
Figure 1 for VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
Figure 2 for VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
Figure 3 for VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
Figure 4 for VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
Viaarxiv icon

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Add code
Dec 14, 2025
Viaarxiv icon

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

Add code
Nov 13, 2025
Figure 1 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Figure 2 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Figure 3 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Figure 4 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Viaarxiv icon