Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

May 23, 2024

Yao Teng, Yue Wu, Han Shi, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu

Figure 1 for DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Figure 2 for DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Figure 3 for DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Figure 4 for DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Share this with someone who'll enjoy it:

Abstract:Diffusion models have achieved great success in image generation, with the backbone evolving from U-Net to Vision Transformers. However, the computational cost of Transformers is quadratic to the number of tokens, leading to significant challenges when dealing with high-resolution images. In this work, we propose Diffusion Mamba (DiM), which combines the efficiency of Mamba, a sequence model based on State Space Models (SSM), with the expressive power of diffusion models for efficient high-resolution image synthesis. To address the challenge that Mamba cannot generalize to 2D signals, we make several architecture designs including multi-directional scans, learnable padding tokens at the end of each row and column, and lightweight local feature enhancement. Our DiM architecture achieves inference-time efficiency for high-resolution images. In addition, to further improve training efficiency for high-resolution image generation with DiM, we investigate ``weak-to-strong'' training strategy that pretrains DiM on low-resolution images ($256\times 256$) and then finetune it on high-resolution images ($512 \times 512$). We further explore training-free upsampling strategies to enable the model to generate higher-resolution images (e.g., $1024\times 1024$ and $1536\times 1536$) without further fine-tuning. Experiments demonstrate the effectiveness and efficiency of our DiM.

View paper on

Share this with someone who'll enjoy it:

Title:DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Paper and Code