Picture for Zhiyu Tan

Zhiyu Tan

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

Add code
Apr 09, 2026
Viaarxiv icon

DiverseDiT: Towards Diverse Representation Learning in Diffusion Transformers

Add code
Mar 04, 2026
Viaarxiv icon

Diff-Aid: Inference-time Adaptive Interaction Denoising for Rectified Text-to-Image Generation

Add code
Feb 14, 2026
Viaarxiv icon

Omni-Video 2: Scaling MLLM-Conditioned Diffusion for Unified Video Generation and Editing

Add code
Feb 09, 2026
Viaarxiv icon

Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion

Add code
Jan 05, 2026
Viaarxiv icon

A unified multimodal understanding and generation model for cross-disciplinary scientific research

Add code
Jan 04, 2026
Viaarxiv icon

Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

Add code
Aug 07, 2025
Figure 1 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 2 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 3 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 4 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Viaarxiv icon

Omni-Video: Democratizing Unified Video Understanding and Generation

Add code
Jul 09, 2025
Viaarxiv icon

SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training

Add code
May 28, 2025
Figure 1 for SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Figure 2 for SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Figure 3 for SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Figure 4 for SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Viaarxiv icon

Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption

Add code
Mar 12, 2025
Figure 1 for Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Figure 2 for Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Figure 3 for Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Figure 4 for Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Viaarxiv icon