Abstract:Recent studies have raised concerns about the effectiveness of Classifier-Free Guidance (CFG), indicating that in low-dimensional settings, it can lead to overshooting the target distribution and reducing sample diversity. In this work, we demonstrate that in infinite and sufficiently high-dimensional contexts CFG effectively reproduces the target distribution, revealing a blessing-of-dimensionality result. Additionally, we explore finite-dimensional effects, precisely characterizing overshoot and variance reduction. Based on our analysis, we introduce non-linear generalizations of CFG. Through numerical simulations on Gaussian mixtures and experiments on class-conditional and text-to-image diffusion models, we validate our analysis and show that our non-linear CFG offers improved flexibility and generation quality without additional computation cost.
Abstract:Recent works have shown that diffusion models can undergo phase transitions, the resolution of which is needed for accurately generating samples. This has motivated the use of different noise schedules, the two most common choices being referred to as variance preserving (VP) and variance exploding (VE). Here we revisit these schedules within the framework of stochastic interpolants. Using the Gaussian Mixture (GM) and Curie-Weiss (CW) data distributions as test case models, we first investigate the effect of the variance of the initial noise distribution and show that VP recovers the low-level feature (the distribution of each mode) but misses the high-level feature (the asymmetry between modes), whereas VE performs oppositely. We also show that this dichotomy, which happens when denoising by a constant amount in each step, can be avoided by using noise schedules specific to VP and VE that allow for the recovery of both high- and low-level features. Finally we show that these schedules yield generative models for the GM and CW model whose probability flow ODE can be discretized using $\Theta_d(1)$ steps in dimension $d$ instead of the $\Theta_d(\sqrt{d})$ steps required by constant denoising.