Picture for Daoan Zhang

Daoan Zhang

VideoWeaver: Evaluating and Evolving Skills for Agentic Long Video Generation

Add code
Jun 06, 2026
Viaarxiv icon

How Far Are Video Models from True Multimodal Reasoning?

Add code
Apr 21, 2026
Viaarxiv icon

A Versatile Multimodal Agent for Multimedia Content Generation

Add code
Jan 06, 2026
Viaarxiv icon

Sphinx: Benchmarking and Modeling for LLM-Driven Pull Request Review

Add code
Jan 06, 2026
Viaarxiv icon

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

Add code
Dec 28, 2025
Viaarxiv icon

VisualActBench: Can VLMs See and Act like a Human?

Add code
Dec 10, 2025
Viaarxiv icon

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Add code
Nov 11, 2025
Viaarxiv icon

On Path to Multimodal Generalist: General-Level and General-Bench

Add code
May 07, 2025
Viaarxiv icon

WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation

Add code
May 02, 2025
Viaarxiv icon

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Add code
Apr 04, 2025
Figure 1 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Figure 2 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Figure 3 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Figure 4 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Viaarxiv icon