Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Neural Residual Diffusion Models for Deep Scalable Vision Generation

Jun 19, 2024

Zhiyuan Ma, Liangliang Zhao, Biqing Qi, Bowen Zhou

Figure 1 for Neural Residual Diffusion Models for Deep Scalable Vision Generation

Figure 2 for Neural Residual Diffusion Models for Deep Scalable Vision Generation

Figure 3 for Neural Residual Diffusion Models for Deep Scalable Vision Generation

Figure 4 for Neural Residual Diffusion Models for Deep Scalable Vision Generation

Share this with someone who'll enjoy it:

Abstract:The most advanced diffusion models have recently adopted increasingly deep stacked networks (e.g., U-Net or Transformer) to promote the generative emergence capabilities of vision generation models similar to large language models (LLMs). However, progressively deeper stacked networks will intuitively cause numerical propagation errors and reduce noisy prediction capabilities on generative data, which hinders massively deep scalable training of vision generation models. In this paper, we first uncover the nature that neural networks being able to effectively perform generative denoising lies in the fact that the intrinsic residual unit has consistent dynamic property with the input signal's reverse diffusion process, thus supporting excellent generative abilities. Afterwards, we stand on the shoulders of two common types of deep stacked networks to propose a unified and massively scalable Neural Residual Diffusion Models framework (Neural-RDM for short), which is a simple yet meaningful change to the common architecture of deep generative networks by introducing a series of learnable gated residual parameters that conform to the generative dynamics. Experimental results on various generative tasks show that the proposed neural residual models obtain state-of-the-art scores on image's and video's generative benchmarks. Rigorous theoretical proofs and extensive experiments also demonstrate the advantages of this simple gated residual mechanism consistent with dynamic modeling in improving the fidelity and consistency of generated content and supporting large-scale scalable training. Code is available at https://github.com/Anonymous/Neural-RDM.

View paper on

Share this with someone who'll enjoy it:

Title:Neural Residual Diffusion Models for Deep Scalable Vision Generation

Paper and Code