Picture for Zhiyuan Zhao

Zhiyuan Zhao

Do MLLMs Really See It: Reinforcing Visual Attention in Multimodal LLMs

Add code
Feb 09, 2026
Viaarxiv icon

ChatUMM: Robust Context Tracking for Conversational Interleaved Generation

Add code
Feb 06, 2026
Viaarxiv icon

LEMAS: Large A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models

Add code
Jan 04, 2026
Viaarxiv icon

FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis

Add code
Dec 16, 2025
Figure 1 for FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis
Figure 2 for FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis
Figure 3 for FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis
Figure 4 for FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis
Viaarxiv icon

Modular Deep-Learning-Based Early Warning System for Deadly Heatwave Prediction

Add code
Dec 09, 2025
Viaarxiv icon

Exploring the Underwater World Segmentation without Extra Training

Add code
Nov 11, 2025
Viaarxiv icon

OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation

Add code
Oct 30, 2025
Viaarxiv icon

MelCap: A Unified Single-Codebook Neural Codec for High-Fidelity Audio Compression

Add code
Oct 02, 2025
Viaarxiv icon

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Add code
Sep 26, 2025
Viaarxiv icon

PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting

Add code
Sep 04, 2025
Viaarxiv icon