Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Klindt

An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research

Apr 17, 2025

Patrik Reizinger, Randall Balestriero, David Klindt, Wieland Brendel

Figure 1 for An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research

Figure 2 for An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research

Abstract:Self-Supervised Learning (SSL) powers many current AI systems. As research interest and investment grow, the SSL design space continues to expand. The Platonic view of SSL, following the Platonic Representation Hypothesis (PRH), suggests that despite different methods and engineering approaches, all representations converge to the same Platonic ideal. However, this phenomenon lacks precise theoretical explanation. By synthesizing evidence from Identifiability Theory (IT), we show that the PRH can emerge in SSL. However, current IT cannot explain SSL's empirical success. To bridge the gap between theory and practice, we propose expanding IT into what we term Singular Identifiability Theory (SITh), a broader theoretical framework encompassing the entire SSL pipeline. SITh would allow deeper insights into the implicit data assumptions in SSL and advance the field towards learning more interpretable and generalizable representations. We highlight three critical directions for future research: 1) training dynamics and convergence properties of SSL; 2) the impact of finite samples, batch size, and data diversity; and 3) the role of inductive biases in architecture, augmentations, initialization schemes, and optimizers.

Via

Access Paper or Ask Questions

From superposition to sparse codes: interpretable representations in neural networks

Mar 03, 2025

David Klindt, Charles O'Neill, Patrik Reizinger, Harald Maurer, Nina Miolane

Abstract:Understanding how information is represented in neural networks is a fundamental challenge in both neuroscience and artificial intelligence. Despite their nonlinear architectures, recent evidence suggests that neural networks encode features in superposition, meaning that input concepts are linearly overlaid within the network's representations. We present a perspective that explains this phenomenon and provides a foundation for extracting interpretable representations from neural activations. Our theoretical framework consists of three steps: (1) Identifiability theory shows that neural networks trained for classification recover latent features up to a linear transformation. (2) Sparse coding methods can extract disentangled features from these representations by leveraging principles from compressed sensing. (3) Quantitative interpretability metrics provide a means to assess the success of these methods, ensuring that extracted features align with human-interpretable concepts. By bridging insights from theoretical neuroscience, representation learning, and interpretability research, we propose an emerging perspective on understanding neural representations in both artificial and biological systems. Our arguments have implications for neural coding theories, AI transparency, and the broader goal of making deep learning models more interpretable.

Via

Access Paper or Ask Questions

Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders

Nov 20, 2024

Charles O'Neill, David Klindt

Figure 1 for Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders

Figure 2 for Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders

Figure 3 for Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders

Figure 4 for Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders

Abstract:A recent line of work has shown promise in using sparse autoencoders (SAEs) to uncover interpretable features in neural network representations. However, the simple linear-nonlinear encoding mechanism in SAEs limits their ability to perform accurate sparse inference. In this paper, we investigate sparse inference and learning in SAEs through the lens of sparse coding. Specifically, we show that SAEs perform amortised sparse inference with a computationally restricted encoder and, using compressed sensing theory, we prove that this mapping is inherently insufficient for accurate sparse inference, even in solvable cases. Building on this theory, we empirically explore conditions where more sophisticated sparse inference methods outperform traditional SAE encoders. Our key contribution is the decoupling of the encoding and decoding processes, which allows for a comparison of various sparse encoding strategies. We evaluate these strategies on two dimensions: alignment with true underlying sparse features and correct inference of sparse codes, while also accounting for computational costs during training and inference. Our results reveal that substantial performance gains can be achieved with minimal increases in compute cost. We demonstrate that this generalises to SAEs applied to large language models (LLMs), where advanced encoders achieve similar interpretability. This work opens new avenues for understanding neural network representations and offers important implications for improving the tools we use to analyse the activations of large language models.

Via

Access Paper or Ask Questions

Cross-Entropy Is All You Need To Invert the Data Generating Process

Oct 29, 2024

Patrik Reizinger, Alice Bizeul, Attila Juhos, Julia E. Vogt, Randall Balestriero, Wieland Brendel, David Klindt

Figure 1 for Cross-Entropy Is All You Need To Invert the Data Generating Process

Figure 2 for Cross-Entropy Is All You Need To Invert the Data Generating Process

Figure 3 for Cross-Entropy Is All You Need To Invert the Data Generating Process

Figure 4 for Cross-Entropy Is All You Need To Invert the Data Generating Process

Abstract:Supervised learning has become a cornerstone of modern machine learning, yet a comprehensive theory explaining its effectiveness remains elusive. Empirical phenomena, such as neural analogy-making and the linear representation hypothesis, suggest that supervised models can learn interpretable factors of variation in a linear fashion. Recent advances in self-supervised learning, particularly nonlinear Independent Component Analysis, have shown that these methods can recover latent structures by inverting the data generating process. We extend these identifiability results to parametric instance discrimination, then show how insights transfer to the ubiquitous setting of supervised learning with cross-entropy minimization. We prove that even in standard classification tasks, models learn representations of ground-truth factors of variation up to a linear transformation. We corroborate our theoretical contribution with a series of empirical studies. First, using simulated data matching our theoretical assumptions, we demonstrate successful disentanglement of latent factors. Second, we show that on DisLib, a widely-used disentanglement benchmark, simple classification tasks recover latent structures up to linear transformations. Finally, we reveal that models trained on ImageNet encode representations that permit linear decoding of proxy factors of variation. Together, our theoretical findings and experiments offer a compelling explanation for recent observations of linear representations, such as superposition in neural networks. This work takes a significant step toward a cohesive theory that accounts for the unreasonable effectiveness of supervised deep learning.

Via

Access Paper or Ask Questions

Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?

Jun 15, 2024

Mark Ibrahim, David Klindt, Randall Balestriero

Figure 1 for Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?

Figure 2 for Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?

Figure 3 for Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?

Figure 4 for Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?

Abstract:Deep Learning is often depicted as a trio of data-architecture-loss. Yet, recent Self Supervised Learning (SSL) solutions have introduced numerous additional design choices, e.g., a projector network, positive views, or teacher-student networks. These additions pose two challenges. First, they limit the impact of theoretical studies that often fail to incorporate all those intertwined designs. Second, they slow-down the deployment of SSL methods to new domains as numerous hyper-parameters need to be carefully tuned. In this study, we bring forward the surprising observation that--at least for pretraining datasets of up to a few hundred thousands samples--the additional designs introduced by SSL do not contribute to the quality of the learned representations. That finding not only provides legitimacy to existing theoretical studies, but also simplifies the practitioner's path to SSL deployment in numerous small and medium scale settings. Our finding answers a long-lasting question: the often-experienced sensitivity to training settings and hyper-parameters encountered in SSL come from their design, rather than the absence of supervised guidance.

Via

Access Paper or Ask Questions

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

Jun 10, 2024

Daniel Kunin, Allan Raventós, Clémentine Dominé, Feng Chen, David Klindt, Andrew Saxe, Surya Ganguli

Figure 1 for Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

Figure 2 for Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

Figure 3 for Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

Figure 4 for Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

Abstract:While the impressive performance of modern neural networks is often attributed to their capacity to efficiently extract task-relevant features from data, the mechanisms underlying this rich feature learning regime remain elusive, with much of our theoretical understanding stemming from the opposing lazy regime. In this work, we derive exact solutions to a minimal model that transitions between lazy and rich learning, precisely elucidating how unbalanced layer-specific initialization variances and learning rates determine the degree of feature learning. Our analysis reveals that they conspire to influence the learning regime through a set of conserved quantities that constrain and modify the geometry of learning trajectories in parameter and function space. We extend our analysis to more complex linear models with multiple neurons, outputs, and layers and to shallow nonlinear networks with piecewise linear activation functions. In linear networks, rapid feature learning only occurs with balanced initializations, where all layers learn at similar speeds. While in nonlinear networks, unbalanced initializations that promote faster learning in earlier layers can accelerate rich learning. Through a series of experiments, we provide evidence that this unbalanced rich regime drives feature learning in deep finite-width networks, promotes interpretability of early layers in CNNs, reduces the sample complexity of learning hierarchical data, and decreases the time to grokking in modular arithmetic. Our theory motivates further exploration of unbalanced initializations to enhance efficient feature learning.

* 40 pages, 12 figures

Via

Access Paper or Ask Questions

Identifying Interpretable Visual Features in Artificial and Biological Neural Systems

Oct 18, 2023

David Klindt, Sophia Sanborn, Francisco Acosta, Frédéric Poitevin, Nina Miolane

Figure 1 for Identifying Interpretable Visual Features in Artificial and Biological Neural Systems

Figure 2 for Identifying Interpretable Visual Features in Artificial and Biological Neural Systems

Figure 3 for Identifying Interpretable Visual Features in Artificial and Biological Neural Systems

Figure 4 for Identifying Interpretable Visual Features in Artificial and Biological Neural Systems

Abstract:Single neurons in neural networks are often interpretable in that they represent individual, intuitively meaningful features. However, many neurons exhibit $\textit{mixed selectivity}$, i.e., they represent multiple unrelated features. A recent hypothesis proposes that features in deep networks may be represented in $\textit{superposition}$, i.e., on non-orthogonal axes by multiple neurons, since the number of possible interpretable features in natural data is generally larger than the number of neurons in a given network. Accordingly, we should be able to find meaningful directions in activation space that are not aligned with individual neurons. Here, we propose (1) an automated method for quantifying visual interpretability that is validated against a large database of human psychophysics judgments of neuron interpretability, and (2) an approach for finding meaningful directions in network activation space. We leverage these methods to discover directions in convolutional neural networks that are more intuitively meaningful than individual neurons, as we confirm and investigate in a series of analyses. Moreover, we apply the same method to three recent datasets of visual neural responses in the brain and find that our conclusions largely transfer to real neural data, suggesting that superposition might be deployed by the brain. This also provides a link with disentanglement and raises fundamental questions about robust, efficient and factorized representations in both artificial and biological neural systems.

Via

Access Paper or Ask Questions

Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Jul 21, 2020

David Klindt, Lukas Schott, Yash Sharma, Ivan Ustyuzhaninov, Wieland Brendel, Matthias Bethge, Dylan Paiton

Figure 1 for Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Figure 2 for Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Figure 3 for Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Figure 4 for Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Abstract:We construct an unsupervised learning model that achieves nonlinear disentanglement of underlying factors of variation in naturalistic videos. Previous work suggests that representations can be disentangled if all but a few factors in the environment stay constant at any point in time. As a result, algorithms proposed for this problem have only been tested on carefully constructed datasets with this exact property, leaving it unclear whether they will transfer to natural scenes. Here we provide evidence that objects in segmented natural movies undergo transitions that are typically small in magnitude with occasional large jumps, which is characteristic of a temporally sparse distribution. We leverage this finding and present SlowVAE, a model for unsupervised representation learning that uses a sparse prior on temporally adjacent observations to disentangle generative factors without any assumptions on the number of changing factors. We provide a proof of identifiability and show that the model reliably learns disentangled representations on several established benchmark datasets, often surpassing the current state-of-the-art. We additionally demonstrate transferability towards video datasets with natural dynamics, Natural Sprites and KITTI Masks, which we contribute as benchmarks for guiding disentanglement research towards more natural data domains.

* Code is available at https://github.com/bethgelab/slow_disentanglement. The first three authors, as well as the last two authors, contributed equally

Via

Access Paper or Ask Questions