Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Giovanni Luca Marchetti

Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks

Jun 06, 2025

Daniel Kunin, Giovanni Luca Marchetti, Feng Chen, Dhruva Karkada, James B. Simon, Michael R. DeWeese, Surya Ganguli, Nina Miolane

Abstract:What features neural networks learn, and how, remains an open question. In this paper, we introduce Alternating Gradient Flows (AGF), an algorithmic framework that describes the dynamics of feature learning in two-layer networks trained from small initialization. Prior works have shown that gradient flow in this regime exhibits a staircase-like loss curve, alternating between plateaus where neurons slowly align to useful directions and sharp drops where neurons rapidly grow in norm. AGF approximates this behavior as an alternating two-step process: maximizing a utility function over dormant neurons and minimizing a cost function over active ones. AGF begins with all neurons dormant. At each round, a dormant neuron activates, triggering the acquisition of a feature and a drop in the loss. AGF quantifies the order, timing, and magnitude of these drops, matching experiments across architectures. We show that AGF unifies and extends existing saddle-to-saddle analyses in fully connected linear networks and attention-only linear transformers, where the learned features are singular modes and principal components, respectively. In diagonal linear networks, we prove AGF converges to gradient flow in the limit of vanishing initialization. Applying AGF to quadratic networks trained to perform modular addition, we give the first complete characterization of the training dynamics, revealing that networks learn Fourier features in decreasing order of coefficient magnitude. Altogether, AGF offers a promising step towards understanding feature learning in neural networks.

* 35 pages, 7 figures

Via

Access Paper or Ask Questions

Learning on a Razor's Edge: the Singularity Bias of Polynomial Neural Networks

May 17, 2025

Vahid Shahverdi, Giovanni Luca Marchetti, Kathlén Kohn

Abstract:Deep neural networks often infer sparse representations, converging to a subnetwork during the learning process. In this work, we theoretically analyze subnetworks and their bias through the lens of algebraic geometry. We consider fully-connected networks with polynomial activation functions, and focus on the geometry of the function space they parametrize, often referred to as neuromanifold. First, we compute the dimension of the subspace of the neuromanifold parametrized by subnetworks. Second, we show that this subspace is singular. Third, we argue that such singularities often correspond to critical points of the training dynamics. Lastly, we discuss convolutional networks, for which subnetworks and singularities are similarly related, but the bias does not arise.

Via

Access Paper or Ask Questions

An Invitation to Neuroalgebraic Geometry

Jan 31, 2025

Giovanni Luca Marchetti, Vahid Shahverdi, Stefano Mereta, Matthew Trager, Kathlén Kohn

Figure 1 for An Invitation to Neuroalgebraic Geometry

Figure 2 for An Invitation to Neuroalgebraic Geometry

Figure 3 for An Invitation to Neuroalgebraic Geometry

Figure 4 for An Invitation to Neuroalgebraic Geometry

Abstract:In this expository work, we promote the study of function spaces parameterized by machine learning models through the lens of algebraic geometry. To this end, we focus on algebraic models, such as neural networks with polynomial activations, whose associated function spaces are semi-algebraic varieties. We outline a dictionary between algebro-geometric invariants of these varieties, such as dimension, degree, and singularities, and fundamental aspects of machine learning, such as sample complexity, expressivity, training dynamics, and implicit bias. Along the way, we review the literature and discuss ideas beyond the algebraic domain. This work lays the foundations of a research direction bridging algebraic geometry and deep learning, that we refer to as neuroalgebraic geometry.

Via

Access Paper or Ask Questions

On the Geometry and Optimization of Polynomial Convolutional Networks

Oct 01, 2024

Vahid Shahverdi, Giovanni Luca Marchetti, Kathlén Kohn

Figure 1 for On the Geometry and Optimization of Polynomial Convolutional Networks

Figure 2 for On the Geometry and Optimization of Polynomial Convolutional Networks

Figure 3 for On the Geometry and Optimization of Polynomial Convolutional Networks

Figure 4 for On the Geometry and Optimization of Polynomial Convolutional Networks

Abstract:We study convolutional neural networks with monomial activation functions. Specifically, we prove that their parameterization map is regular and is an isomorphism almost everywhere, up to rescaling the filters. By leveraging on tools from algebraic geometry, we explore the geometric properties of the image in function space of this map -- typically referred to as neuromanifold. In particular, we compute the dimension and the degree of the neuromanifold, which measure the expressivity of the model, and describe its singularities. Moreover, for a generic large dataset, we derive an explicit formula that quantifies the number of critical points arising in the optimization of a regression loss.

Via

Access Paper or Ask Questions

Relative Representations: Topological and Geometric Perspectives

Sep 17, 2024

Alejandro García-Castellanos, Giovanni Luca Marchetti, Danica Kragic, Martina Scolamiero

Abstract:Relative representations are an established approach to zero-shot model stitching, consisting of a non-trainable transformation of the latent space of a deep neural network. Based on insights of topological and geometric nature, we propose two improvements to relative representations. First, we introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations. The latter coincides with the symmetries in parameter space induced by common activation functions. Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes. We provide an empirical investigation on a natural language task, where both the proposed variations yield improved performance on zero-shot model stitching.

Via

Access Paper or Ask Questions

Geometry of Lightning Self-Attention: Identifiability and Dimension

Aug 30, 2024

Nathan W. Henry, Giovanni Luca Marchetti, Kathlén Kohn

Figure 1 for Geometry of Lightning Self-Attention: Identifiability and Dimension

Figure 2 for Geometry of Lightning Self-Attention: Identifiability and Dimension

Figure 3 for Geometry of Lightning Self-Attention: Identifiability and Dimension

Figure 4 for Geometry of Lightning Self-Attention: Identifiability and Dimension

Abstract:We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.

Via

Access Paper or Ask Questions

Hyperbolic Delaunay Geometric Alignment

Apr 12, 2024

Aniss Aiman Medbouhi, Giovanni Luca Marchetti, Vladislav Polianskii, Alexander Kravberg, Petra Poklukar, Anastasia Varava, Danica Kragic

Abstract:Hyperbolic machine learning is an emerging field aimed at representing data with a hierarchical structure. However, there is a lack of tools for evaluation and analysis of the resulting hyperbolic data representations. To this end, we propose Hyperbolic Delaunay Geometric Alignment (HyperDGA) -- a similarity score for comparing datasets in a hyperbolic space. The core idea is counting the edges of the hyperbolic Delaunay graph connecting datapoints across the given sets. We provide an empirical investigation on synthetic and real-life biological data and demonstrate that HyperDGA outperforms the hyperbolic version of classical distances between sets. Furthermore, we showcase the potential of HyperDGA for evaluating latent representations inferred by a Hyperbolic Variational Auto-Encoder.

Via

Access Paper or Ask Questions

Harmonics of Learning: Universal Fourier Features Emerge in Invariant Networks

Dec 23, 2023

Giovanni Luca Marchetti, Christopher Hillar, Danica Kragic, Sophia Sanborn

Abstract:In this work, we formally prove that, under certain conditions, if a neural network is invariant to a finite group then its weights recover the Fourier transform on that group. This provides a mathematical explanation for the emergence of Fourier features -- a ubiquitous phenomenon in both biological and artificial learning systems. The results hold even for non-commutative groups, in which case the Fourier transform encodes all the irreducible unitary group representations. Our findings have consequences for the problem of symmetry discovery. Specifically, we demonstrate that the algebraic structure of an unknown group can be recovered from the weights of a network that is at least approximately invariant within certain bounds. Overall, this work contributes to a foundation for an algebraic learning theory of invariant neural network representations.

Via

Access Paper or Ask Questions

Neural Lattice Reduction: A Self-Supervised Geometric Deep Learning Approach

Nov 14, 2023

Giovanni Luca Marchetti, Gabriele Cesa, Kumar Pratik, Arash Behboodi

Abstract:Lattice reduction is a combinatorial optimization problem aimed at finding the most orthogonal basis in a given lattice. In this work, we address lattice reduction via deep learning methods. We design a deep neural model outputting factorized unimodular matrices and train it in a self-supervised manner by penalizing non-orthogonal lattice bases. We incorporate the symmetries of lattice reduction into the model by making it invariant and equivariant with respect to appropriate continuous and discrete groups.

* Symmetry and Geometry in Neural Representations - NeurReps Workshop @ NeurIPS 2023

Via

Access Paper or Ask Questions

Learning Geometric Representations of Objects via Interaction

Sep 11, 2023

Alfredo Reichlin, Giovanni Luca Marchetti, Hang Yin, Anastasiia Varava, Danica Kragic

Figure 1 for Learning Geometric Representations of Objects via Interaction

Figure 2 for Learning Geometric Representations of Objects via Interaction

Figure 3 for Learning Geometric Representations of Objects via Interaction

Figure 4 for Learning Geometric Representations of Objects via Interaction

Abstract:We address the problem of learning representations from observations of a scene involving an agent and an external object the agent interacts with. To this end, we propose a representation learning framework extracting the location in physical space of both the agent and the object from unstructured observations of arbitrary nature. Our framework relies on the actions performed by the agent as the only source of supervision, while assuming that the object is displaced by the agent via unknown dynamics. We provide a theoretical foundation and formally prove that an ideal learner is guaranteed to infer an isometric representation, disentangling the agent from the object and correctly extracting their locations. We evaluate empirically our framework on a variety of scenarios, showing that it outperforms vision-based approaches such as a state-of-the-art keypoint extractor. We moreover demonstrate how the extracted representations enable the agent to solve downstream tasks via reinforcement learning in an efficient manner.

Via

Access Paper or Ask Questions