Picture for Enze Xie

Enze Xie

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Add code
Nov 07, 2024
Figure 1 for SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Figure 2 for SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Figure 3 for SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Figure 4 for SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Viaarxiv icon

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

Add code
Oct 15, 2024
Viaarxiv icon

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

Add code
Oct 14, 2024
Figure 1 for Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Figure 2 for Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Figure 3 for Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Figure 4 for Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Viaarxiv icon

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Add code
Oct 14, 2024
Figure 1 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Figure 2 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Figure 3 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Figure 4 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Viaarxiv icon

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Add code
Sep 06, 2024
Figure 1 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Figure 2 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Figure 3 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Figure 4 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Viaarxiv icon

Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

Add code
Jul 16, 2024
Figure 1 for Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts
Figure 2 for Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts
Viaarxiv icon

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Add code
May 09, 2024
Figure 1 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 2 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 3 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 4 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Viaarxiv icon

DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving

Add code
Mar 25, 2024
Figure 1 for DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving
Figure 2 for DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving
Figure 3 for DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving
Figure 4 for DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving
Viaarxiv icon

Editing Massive Concepts in Text-to-Image Diffusion Models

Add code
Mar 20, 2024
Viaarxiv icon

TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model

Add code
Mar 15, 2024
Viaarxiv icon