Picture for Yu Zhou

Yu Zhou

National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China, Fanyu AI Laboratory, Zhongke Fanyu Technology Co., Ltd, Beijing, China

DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation

Add code
Oct 16, 2025
Viaarxiv icon

LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training

Add code
Oct 16, 2025
Viaarxiv icon

Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach

Add code
Sep 26, 2025
Viaarxiv icon

A Correction for the Paper "Symplectic geometry mode decomposition and its application to rotating machinery compound fault diagnosis"

Add code
Aug 29, 2025
Viaarxiv icon

PathMR: Multimodal Visual Reasoning for Interpretable Pathology Diagnosis

Add code
Aug 28, 2025
Viaarxiv icon

TADoc: Robust Time-Aware Document Image Dewarping

Add code
Aug 09, 2025
Viaarxiv icon

Gather and Trace: Rethinking Video TextVQA from an Instance-oriented Perspective

Add code
Aug 06, 2025
Viaarxiv icon

Uni-DocDiff: A Unified Document Restoration Model Based on Diffusion

Add code
Aug 06, 2025
Viaarxiv icon

Step-Audio 2 Technical Report

Add code
Jul 24, 2025
Viaarxiv icon

Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation

Add code
Jul 10, 2025
Viaarxiv icon