Picture for Zhiyu Tan

Zhiyu Tan

Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion

Add code
Jan 05, 2026
Viaarxiv icon

A unified multimodal understanding and generation model for cross-disciplinary scientific research

Add code
Jan 04, 2026
Viaarxiv icon

Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

Add code
Aug 07, 2025
Figure 1 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 2 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 3 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 4 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Viaarxiv icon

Omni-Video: Democratizing Unified Video Understanding and Generation

Add code
Jul 09, 2025
Viaarxiv icon

SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training

Add code
May 28, 2025
Figure 1 for SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Figure 2 for SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Figure 3 for SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Figure 4 for SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Viaarxiv icon

Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption

Add code
Mar 12, 2025
Viaarxiv icon

SARA: Structural and Adversarial Representation Alignment for Training-efficient Diffusion Models

Add code
Mar 11, 2025
Viaarxiv icon

Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos

Add code
Feb 28, 2025
Figure 1 for Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos
Figure 2 for Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos
Figure 3 for Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos
Figure 4 for Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos
Viaarxiv icon

IPO: Iterative Preference Optimization for Text-to-Video Generation

Add code
Feb 05, 2025
Figure 1 for IPO: Iterative Preference Optimization for Text-to-Video Generation
Figure 2 for IPO: Iterative Preference Optimization for Text-to-Video Generation
Figure 3 for IPO: Iterative Preference Optimization for Text-to-Video Generation
Figure 4 for IPO: Iterative Preference Optimization for Text-to-Video Generation
Viaarxiv icon

E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models

Add code
Dec 30, 2024
Figure 1 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 2 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 3 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 4 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Viaarxiv icon