Picture for Yuanxing Zhang

Yuanxing Zhang

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models

Add code
Jan 27, 2026
Viaarxiv icon

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Add code
Jan 15, 2026
Viaarxiv icon

SemanticGen: Video Generation in Semantic Space

Add code
Dec 24, 2025
Figure 1 for SemanticGen: Video Generation in Semantic Space
Figure 2 for SemanticGen: Video Generation in Semantic Space
Figure 3 for SemanticGen: Video Generation in Semantic Space
Figure 4 for SemanticGen: Video Generation in Semantic Space
Viaarxiv icon

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

Add code
Dec 24, 2025
Viaarxiv icon

Visual-Aware CoT: Achieving High-Fidelity Visual Consistency in Unified Models

Add code
Dec 22, 2025
Viaarxiv icon

Kling-Omni Technical Report

Add code
Dec 18, 2025
Figure 1 for Kling-Omni Technical Report
Figure 2 for Kling-Omni Technical Report
Figure 3 for Kling-Omni Technical Report
Figure 4 for Kling-Omni Technical Report
Viaarxiv icon

KlingAvatar 2.0 Technical Report

Add code
Dec 15, 2025
Viaarxiv icon

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Add code
Dec 14, 2025
Figure 1 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Figure 2 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Figure 3 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Figure 4 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Viaarxiv icon

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Add code
Dec 12, 2025
Viaarxiv icon

The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss

Add code
Dec 09, 2025
Figure 1 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Figure 2 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Figure 3 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Figure 4 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Viaarxiv icon