Picture for Junshi Huang

Junshi Huang

FLUX that Plays Music

Add code
Sep 01, 2024
Viaarxiv icon

Scaling Diffusion Transformers to 16 Billion Parameters

Add code
Jul 16, 2024
Viaarxiv icon

Dimba: Transformer-Mamba Diffusion Models

Add code
Jun 03, 2024
Viaarxiv icon

Music Consistency Models

Add code
Apr 20, 2024
Viaarxiv icon

Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models

Add code
Apr 06, 2024
Viaarxiv icon

Scalable Diffusion Models with State Space Backbone

Add code
Feb 25, 2024
Viaarxiv icon

Tuning-Free Inversion-Enhanced Control for Consistent Image Editing

Add code
Dec 22, 2023
Viaarxiv icon

A-JEPA: Joint-Embedding Predictive Architecture Can Listen

Add code
Nov 28, 2023
Figure 1 for A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Figure 2 for A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Figure 3 for A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Figure 4 for A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Viaarxiv icon

Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding

Add code
Nov 02, 2023
Viaarxiv icon

DiT: Efficient Vision Transformers with Dynamic Token Routing

Add code
Aug 11, 2023
Viaarxiv icon