Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Binxu Wang

An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models

Mar 05, 2025

Binxu Wang

Figure 1 for An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models

Figure 2 for An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models

Figure 3 for An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models

Figure 4 for An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models

Abstract:We developed an analytical framework for understanding how the learned distribution evolves during diffusion model training. Leveraging the Gaussian equivalence principle, we derived exact solutions for the gradient-flow dynamics of weights in one- or two-layer linear denoiser settings with arbitrary data. Remarkably, these solutions allowed us to derive the generated distribution in closed form and its KL divergence through training. These analytical results expose a pronounced power-law spectral bias, i.e., for weights and distributions, the convergence time of a mode follows an inverse power law of its variance. Empirical experiments on both Gaussian and image datasets demonstrate that the power-law spectral bias remains robust even when using deeper or convolutional architectures. Our results underscore the importance of the data covariance in dictating the order and rate at which diffusion models learn different modes of the data, providing potential explanations for why earlier stopping could lead to incorrect details in image generative models.

* 50 pages, 10 figures. Preprint

Via

Access Paper or Ask Questions

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Feb 18, 2025

Thomas Fel, Ekdeep Singh Lubana, Jacob S. Prince, Matthew Kowal, Victor Boutin, Isabel Papadimitriou, Binxu Wang, Martin Wattenberg, Demba Ba, Talia Konkle

Abstract:Sparse Autoencoders (SAEs) have emerged as a powerful framework for machine learning interpretability, enabling the unsupervised decomposition of model representations into a dictionary of abstract, human-interpretable concepts. However, we reveal a fundamental limitation: existing SAEs exhibit severe instability, as identical models trained on similar datasets can produce sharply different dictionaries, undermining their reliability as an interpretability tool. To address this issue, we draw inspiration from the Archetypal Analysis framework introduced by Cutler & Breiman (1994) and present Archetypal SAEs (A-SAE), wherein dictionary atoms are constrained to the convex hull of data. This geometric anchoring significantly enhances the stability of inferred dictionaries, and their mildly relaxed variants RA-SAEs further match state-of-the-art reconstruction abilities. To rigorously assess dictionary quality learned by SAEs, we introduce two new benchmarks that test (i) plausibility, if dictionaries recover "true" classification directions and (ii) identifiability, if dictionaries disentangle synthetic concept mixtures. Across all evaluations, RA-SAEs consistently yield more structured representations while uncovering novel, semantically meaningful concepts in large-scale vision models.

Via

Access Paper or Ask Questions

The Unreasonable Effectiveness of Gaussian Score Approximation for Diffusion Models and its Applications

Dec 12, 2024

Binxu Wang, John J. Vastola

Figure 1 for The Unreasonable Effectiveness of Gaussian Score Approximation for Diffusion Models and its Applications

Figure 2 for The Unreasonable Effectiveness of Gaussian Score Approximation for Diffusion Models and its Applications

Figure 3 for The Unreasonable Effectiveness of Gaussian Score Approximation for Diffusion Models and its Applications

Figure 4 for The Unreasonable Effectiveness of Gaussian Score Approximation for Diffusion Models and its Applications

Abstract:By learning the gradient of smoothed data distributions, diffusion models can iteratively generate samples from complex distributions. The learned score function enables their generalization capabilities, but how the learned score relates to the score of the underlying data manifold remains largely unclear. Here, we aim to elucidate this relationship by comparing learned neural scores to the scores of two kinds of analytically tractable distributions: Gaussians and Gaussian mixtures. The simplicity of the Gaussian model makes it theoretically attractive, and we show that it admits a closed-form solution and predicts many qualitative aspects of sample generation dynamics. We claim that the learned neural score is dominated by its linear (Gaussian) approximation for moderate to high noise scales, and supply both theoretical and empirical arguments to support this claim. Moreover, the Gaussian approximation empirically works for a larger range of noise scales than naive theory suggests it should, and is preferentially learned early in training. At smaller noise scales, we observe that learned scores are better described by a coarse-grained (Gaussian mixture) approximation of training data than by the score of the training distribution, a finding consistent with generalization. Our findings enable us to precisely predict the initial phase of trained models' sampling trajectories through their Gaussian approximations. We show that this allows the skipping of the first 15-30% of sampling steps while maintaining high sample quality (with a near state-of-the-art FID score of 1.93 on CIFAR-10 unconditional generation). This forms the foundation of a novel hybrid sampling method, termed analytical teleportation, which can seamlessly integrate with and accelerate existing samplers, including DPM-Solver-v3 and UniPC. Our findings suggest ways to improve the design and training of diffusion models.

* Transactions on Machine Learning Research, 2024. https://openreview.net/forum?id=I0uknSHM2j
* 69 pages, 34 figures. Published in TMLR. Previous shorter versions at arxiv.org/abs/2303.02490 and arxiv.org/abs/2311.10892

Via

Access Paper or Ask Questions

Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules

Nov 12, 2024

Binxu Wang, Jiaqi Shang, Haim Sompolinsky

Figure 1 for Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules

Figure 2 for Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules

Figure 3 for Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules

Figure 4 for Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules

Abstract:Humans excel at discovering regular structures from limited samples and applying inferred rules to novel settings. We investigate whether modern generative models can similarly learn underlying rules from finite samples and perform reasoning through conditional sampling. Inspired by Raven's Progressive Matrices task, we designed GenRAVEN dataset, where each sample consists of three rows, and one of 40 relational rules governing the object position, number, or attributes applies to all rows. We trained generative models to learn the data distribution, where samples are encoded as integer arrays to focus on rule learning. We compared two generative model families: diffusion (EDM, DiT, SiT) and autoregressive models (GPT2, Mamba). We evaluated their ability to generate structurally consistent samples and perform panel completion via unconditional and conditional sampling. We found diffusion models excel at unconditional generation, producing more novel and consistent samples from scratch and memorizing less, but performing less well in panel completion, even with advanced conditional sampling methods. Conversely, autoregressive models excel at completing missing panels in a rule-consistent manner but generate less consistent samples unconditionally. We observe diverse data scaling behaviors: for both model families, rule learning emerges at a certain dataset size - around 1000s examples per rule. With more training data, diffusion models improve both their unconditional and conditional generation capabilities. However, for autoregressive models, while panel completion improves with more training data, unconditional generation consistency declines. Our findings highlight complementary capabilities and limitations of diffusion and autoregressive models in rule learning and reasoning tasks, suggesting avenues for further research into their mechanisms and potential for human-like reasoning.

* 12 pages, 5 figures. Accepted to NeurIPS2024 Workshop on System 2 Reasoning At Scale as long paper

Via

Access Paper or Ask Questions

The Hidden Linear Structure in Score-Based Models and its Application

Nov 17, 2023

Binxu Wang, John J. Vastola

Abstract:Score-based models have achieved remarkable results in the generative modeling of many domains. By learning the gradient of smoothed data distribution, they can iteratively generate samples from complex distribution e.g. natural images. However, is there any universal structure in the gradient field that will eventually be learned by any neural network? Here, we aim to find such structures through a normative analysis of the score function. First, we derived the closed-form solution to the scored-based model with a Gaussian score. We claimed that for well-trained diffusion models, the learned score at a high noise scale is well approximated by the linear score of Gaussian. We demonstrated this through empirical validation of pre-trained images diffusion model and theoretical analysis of the score function. This finding enabled us to precisely predict the initial diffusion trajectory using the analytical solution and to accelerate image sampling by 15-30\% by skipping the initial phase without sacrificing image quality. Our finding of the linear structure in the score-based model has implications for better model design and data pre-processing.

* Accepted to Workshop on Diffusion Models in NeurIPS 2023. 24 pages, 8 figures

Via

Access Paper or Ask Questions

Diffusion Models Generate Images Like Painters: an Analytical Theory of Outline First, Details Later

Mar 04, 2023

Binxu Wang, John J. Vastola

Abstract:How do diffusion generative models convert pure noise into meaningful images? We argue that generation involves first committing to an outline, and then to finer and finer details. The corresponding reverse diffusion process can be modeled by dynamics on a (time-dependent) high-dimensional landscape full of Gaussian-like modes, which makes the following predictions: (i) individual trajectories tend to be very low-dimensional; (ii) scene elements that vary more within training data tend to emerge earlier; and (iii) early perturbations substantially change image content more often than late perturbations. We show that the behavior of a variety of trained unconditional and conditional diffusion models like Stable Diffusion is consistent with these predictions. Finally, we use our theory to search for the latent image manifold of diffusion models, and propose a new way to generate interpretable image variations. Our viewpoint suggests generation by GANs and diffusion models have unexpected similarities.

* 36 pages, 27 figures

Via

Access Paper or Ask Questions

On the Level Sets and Invariance of Neural Tuning Landscapes

Dec 26, 2022

Binxu Wang, Carlos R. Ponce

Figure 1 for On the Level Sets and Invariance of Neural Tuning Landscapes

Figure 2 for On the Level Sets and Invariance of Neural Tuning Landscapes

Figure 3 for On the Level Sets and Invariance of Neural Tuning Landscapes

Figure 4 for On the Level Sets and Invariance of Neural Tuning Landscapes

Abstract:Visual representations can be defined as the activations of neuronal populations in response to images. The activation of a neuron as a function over all image space has been described as a "tuning landscape". As a function over a high-dimensional space, what is the structure of this landscape? In this study, we characterize tuning landscapes through the lens of level sets and Morse theory. A recent study measured the in vivo two-dimensional tuning maps of neurons in different brain regions. Here, we developed a statistically reliable signature for these maps based on the change of topology in level sets. We found this topological signature changed progressively throughout the cortical hierarchy, with similar trends found for units in convolutional neural networks (CNNs). Further, we analyzed the geometry of level sets on the tuning landscapes of CNN units. We advanced the hypothesis that higher-order units can be locally regarded as isotropic radial basis functions, but not globally. This shows the power of level sets as a conceptual tool to understand neuronal activations over image space.

* 24 pages, 13 figures. Published in NeurIPS 2022 Workshop on Symmetry and Geometry in Neural Representations, and PMLR volume 197

Via

Access Paper or Ask Questions

High-performance Evolutionary Algorithms for Online Neuron Control

Apr 14, 2022

Binxu Wang, Carlos R. Ponce

Figure 1 for High-performance Evolutionary Algorithms for Online Neuron Control

Figure 2 for High-performance Evolutionary Algorithms for Online Neuron Control

Figure 3 for High-performance Evolutionary Algorithms for Online Neuron Control

Figure 4 for High-performance Evolutionary Algorithms for Online Neuron Control

Abstract:Recently, optimization has become an emerging tool for neuroscientists to study neural code. In the visual system, neurons respond to images with graded and noisy responses. Image patterns eliciting highest responses are diagnostic of the coding content of the neuron. To find these patterns, we have used black-box optimizers to search a 4096d image space, leading to the evolution of images that maximize neuronal responses. Although genetic algorithm (GA) has been commonly used, there haven't been any systematic investigations to reveal the best performing optimizer or the underlying principles necessary to improve them. Here, we conducted a large scale in silico benchmark of optimizers for activation maximization and found that Covariance Matrix Adaptation (CMA) excelled in its achieved activation. We compared CMA against GA and found that CMA surpassed the maximal activation of GA by 66% in silico and 44% in vivo. We analyzed the structure of Evolution trajectories and found that the key to success was not covariance matrix adaptation, but local search towards informative dimensions and an effective step size decay. Guided by these principles and the geometry of the image manifold, we developed SphereCMA optimizer which competed well against CMA, proving the validity of the identified principles. Code available at https://github.com/Animadversio/ActMax-Optimizer-Dev

* 19 pages, 22 figures, 3 tables. Accepted as full paper to The Genetic and Evolutionary Computation Conference 2022

Via

Access Paper or Ask Questions

On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation

Dec 14, 2021

Binxu Wang, David Mayo, Arturo Deza, Andrei Barbu, Colin Conwell

Figure 1 for On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation

Figure 2 for On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation

Figure 3 for On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation

Figure 4 for On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation

Abstract:Self-supervised learning is a powerful way to learn useful representations from natural data. It has also been suggested as one possible means of building visual representation in humans, but the specific objective and algorithm are unknown. Currently, most self-supervised methods encourage the system to learn an invariant representation of different transformations of the same image in contrast to those of other images. However, such transformations are generally non-biologically plausible, and often consist of contrived perceptual schemes such as random cropping and color jittering. In this paper, we attempt to reverse-engineer these augmentations to be more biologically or perceptually plausible while still conferring the same benefits for encouraging robust representation. Critically, we find that random cropping can be substituted by cortical magnification, and saccade-like sampling of the image could also assist the representation learning. The feasibility of these transformations suggests a potential way that biological visual systems could implement self-supervision. Further, they break the widely accepted spatially-uniform processing assumption used in many computer vision algorithms, suggesting a role for spatially-adaptive computation in humans and machines alike. Our code and demo can be found here.

* 14 pages, 6 figures, 2 tables. Published in NeurIPS 2021 Workshop, Shared Visual Representations in Human & Machine Intelligence (SVRHM). For code, see https://github.com/Animadversio/Foveated_Saccade_SimCLR

Via

Access Paper or Ask Questions

The Geometry of Deep Generative Image Models and its Applications

Jan 15, 2021

Binxu Wang, Carlos R. Ponce

Figure 1 for The Geometry of Deep Generative Image Models and its Applications

Figure 2 for The Geometry of Deep Generative Image Models and its Applications

Figure 3 for The Geometry of Deep Generative Image Models and its Applications

Figure 4 for The Geometry of Deep Generative Image Models and its Applications

Abstract:Generative adversarial networks (GANs) have emerged as a powerful unsupervised method to model the statistical patterns of real-world data sets, such as natural images. These networks are trained to map random inputs in their latent space to new samples representative of the learned data. However, the structure of the latent space is hard to intuit due to its high dimensionality and the non-linearity of the generator, which limits the usefulness of the models. Understanding the latent space requires a way to identify input codes for existing real-world images (inversion), and a way to identify directions with known image transformations (interpretability). Here, we use a geometric framework to address both issues simultaneously. We develop an architecture-agnostic method to compute the Riemannian metric of the image manifold created by GANs. The eigen-decomposition of the metric isolates axes that account for different levels of image variability. An empirical analysis of several pretrained GANs shows that image variation around each position is concentrated along surprisingly few major axes (the space is highly anisotropic) and the directions that create this large variation are similar at different positions in the space (the space is homogeneous). We show that many of the top eigenvectors correspond to interpretable transforms in the image space, with a substantial part of eigenspace corresponding to minor transforms which could be compressed out. This geometric understanding unifies key previous results related to GAN interpretability. We show that the use of this metric allows for more efficient optimization in the latent space (e.g. GAN inversion) and facilitates unsupervised discovery of interpretable axes. Our results illustrate that defining the geometry of the GAN image manifold can serve as a general framework for understanding GANs.

* 23 pages, 10 figures. Published as a conference paper at ICLR 2021

Via

Access Paper or Ask Questions