Picture for Yuanxing Zhang

Yuanxing Zhang

Vibe AIGC: A New Paradigm for Content Generation via Agentic Orchestration

Add code
Feb 05, 2026
Viaarxiv icon

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Add code
Feb 04, 2026
Viaarxiv icon

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Add code
Feb 03, 2026
Viaarxiv icon

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Add code
Feb 02, 2026
Viaarxiv icon

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models

Add code
Jan 27, 2026
Viaarxiv icon

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Add code
Jan 15, 2026
Viaarxiv icon

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

Add code
Dec 24, 2025
Viaarxiv icon

SemanticGen: Video Generation in Semantic Space

Add code
Dec 24, 2025
Figure 1 for SemanticGen: Video Generation in Semantic Space
Figure 2 for SemanticGen: Video Generation in Semantic Space
Figure 3 for SemanticGen: Video Generation in Semantic Space
Figure 4 for SemanticGen: Video Generation in Semantic Space
Viaarxiv icon

Visual-Aware CoT: Achieving High-Fidelity Visual Consistency in Unified Models

Add code
Dec 22, 2025
Viaarxiv icon

Kling-Omni Technical Report

Add code
Dec 18, 2025
Figure 1 for Kling-Omni Technical Report
Figure 2 for Kling-Omni Technical Report
Figure 3 for Kling-Omni Technical Report
Figure 4 for Kling-Omni Technical Report
Viaarxiv icon