Picture for Zhiyu Tan

Zhiyu Tan

Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos

Add code
Feb 28, 2025
Viaarxiv icon

IPO: Iterative Preference Optimization for Text-to-Video Generation

Add code
Feb 05, 2025
Figure 1 for IPO: Iterative Preference Optimization for Text-to-Video Generation
Figure 2 for IPO: Iterative Preference Optimization for Text-to-Video Generation
Figure 3 for IPO: Iterative Preference Optimization for Text-to-Video Generation
Figure 4 for IPO: Iterative Preference Optimization for Text-to-Video Generation
Viaarxiv icon

E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models

Add code
Dec 30, 2024
Figure 1 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 2 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 3 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 4 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Viaarxiv icon

LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

Add code
Dec 06, 2024
Viaarxiv icon

ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack

Add code
Aug 10, 2024
Figure 1 for ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack
Figure 2 for ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack
Figure 3 for ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack
Figure 4 for ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack
Viaarxiv icon

VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

Add code
Aug 05, 2024
Figure 1 for VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
Figure 2 for VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
Figure 3 for VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
Figure 4 for VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
Viaarxiv icon

EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models

Add code
Jun 27, 2024
Viaarxiv icon

EvalAlign: Evaluating Text-to-Image Models through Precision Alignment of Multimodal Large Models with Supervised Fine-Tuning to Human Annotations

Add code
Jun 24, 2024
Viaarxiv icon

Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation

Add code
Mar 08, 2024
Viaarxiv icon

OVO: Open-Vocabulary Occupancy

Add code
May 25, 2023
Figure 1 for OVO: Open-Vocabulary Occupancy
Figure 2 for OVO: Open-Vocabulary Occupancy
Figure 3 for OVO: Open-Vocabulary Occupancy
Figure 4 for OVO: Open-Vocabulary Occupancy
Viaarxiv icon