Picture for Zhengcong Fei

Zhengcong Fei

FLUX that Plays Music

Add code
Sep 01, 2024
Viaarxiv icon

Scaling Diffusion Transformers to 16 Billion Parameters

Add code
Jul 16, 2024
Viaarxiv icon

Dimba: Transformer-Mamba Diffusion Models

Add code
Jun 03, 2024
Viaarxiv icon

Music Consistency Models

Add code
Apr 20, 2024
Viaarxiv icon

Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models

Add code
Apr 06, 2024
Viaarxiv icon

Scalable Diffusion Models with State Space Backbone

Add code
Feb 25, 2024
Viaarxiv icon

Tuning-Free Inversion-Enhanced Control for Consistent Image Editing

Add code
Dec 22, 2023
Viaarxiv icon

A-JEPA: Joint-Embedding Predictive Architecture Can Listen

Add code
Nov 28, 2023
Figure 1 for A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Figure 2 for A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Figure 3 for A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Figure 4 for A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Viaarxiv icon

Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning

Add code
Sep 10, 2023
Figure 1 for Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning
Figure 2 for Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning
Figure 3 for Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning
Figure 4 for Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning
Viaarxiv icon

DiT: Efficient Vision Transformers with Dynamic Token Routing

Add code
Aug 11, 2023
Viaarxiv icon