Picture for Zhiyuan Zhao

Zhiyuan Zhao

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

Add code
Apr 06, 2026
Viaarxiv icon

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Add code
Mar 26, 2026
Viaarxiv icon

Seeking Universal Shot Language Understanding Solutions

Add code
Mar 19, 2026
Viaarxiv icon

Boosting Quantitive and Spatial Awareness for Zero-Shot Object Counting

Add code
Mar 17, 2026
Viaarxiv icon

IntroSVG: Learning from Rendering Feedback for Text-to-SVG Generation via an Introspective Generator-Critic Framework

Add code
Mar 10, 2026
Viaarxiv icon

UNICBench: UNIfied Counting Benchmark for MLLM

Add code
Feb 28, 2026
Viaarxiv icon

AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors

Add code
Feb 27, 2026
Viaarxiv icon

Do MLLMs Really See It: Reinforcing Visual Attention in Multimodal LLMs

Add code
Feb 09, 2026
Viaarxiv icon

ChatUMM: Robust Context Tracking for Conversational Interleaved Generation

Add code
Feb 06, 2026
Viaarxiv icon

LEMAS: Large A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models

Add code
Jan 04, 2026
Viaarxiv icon