Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

May 04, 2024

Yuchuan Tian, Zhijun Tu, Hanting Chen, Jie Hu, Chao Xu, Yunhe Wang

Figure 1 for U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

Figure 2 for U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

Figure 3 for U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

Figure 4 for U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

Share this with someone who'll enjoy it:

Abstract:Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation. With an isotropic architecture that chains a series of transformer blocks, DiTs demonstrate competitive performance and good scalability; but meanwhile, the abandonment of U-Net by DiTs and their following improvements is worth rethinking. To this end, we conduct a simple toy experiment by comparing a U-Net architectured DiT with an isotropic one. It turns out that the U-Net architecture only gain a slight advantage amid the U-Net inductive bias, indicating potential redundancies within the U-Net-style DiT. Inspired by the discovery that U-Net backbone features are low-frequency-dominated, we perform token downsampling on the query-key-value tuple for self-attention and bring further improvements despite a considerable amount of reduction in computation. Based on self-attention with downsampled tokens, we propose a series of U-shaped DiTs (U-DiTs) in the paper and conduct extensive experiments to demonstrate the extraordinary performance of U-DiT models. The proposed U-DiT could outperform DiT-XL/2 with only 1/6 of its computation cost. Codes are available at https://github.com/YuchuanTian/U-DiT.

* 11 pages, 5 figures

View paper on

Share this with someone who'll enjoy it:

Title:U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

Paper and Code