Picture for Zhuowen Tu

Zhuowen Tu

Efficient Scaling of Diffusion Transformers for Text-to-Image Generation

Add code
Dec 16, 2024
Viaarxiv icon

DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models

Add code
Oct 04, 2024
Viaarxiv icon

Open-World Dynamic Prompt and Continual Visual Representation Learning

Add code
Sep 09, 2024
Figure 1 for Open-World Dynamic Prompt and Continual Visual Representation Learning
Figure 2 for Open-World Dynamic Prompt and Continual Visual Representation Learning
Figure 3 for Open-World Dynamic Prompt and Continual Visual Representation Learning
Figure 4 for Open-World Dynamic Prompt and Continual Visual Representation Learning
Viaarxiv icon

Goldfish: Monolingual Language Models for 350 Languages

Add code
Aug 19, 2024
Viaarxiv icon

OmniControlNet: Dual-stage Integration for Conditional Image Generation

Add code
Jun 09, 2024
Viaarxiv icon

Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model

Add code
Apr 28, 2024
Viaarxiv icon

On the Scalability of Diffusion-based Text-to-Image Generation

Add code
Apr 03, 2024
Viaarxiv icon

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

Add code
Mar 18, 2024
Viaarxiv icon

Bayesian Diffusion Models for 3D Shape Reconstruction

Add code
Mar 11, 2024
Viaarxiv icon

Enhancing Vision-Language Pre-training with Rich Supervisions

Add code
Mar 05, 2024
Figure 1 for Enhancing Vision-Language Pre-training with Rich Supervisions
Figure 2 for Enhancing Vision-Language Pre-training with Rich Supervisions
Figure 3 for Enhancing Vision-Language Pre-training with Rich Supervisions
Figure 4 for Enhancing Vision-Language Pre-training with Rich Supervisions
Viaarxiv icon