Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vahid Shahverdi

Learning on a Razor's Edge: the Singularity Bias of Polynomial Neural Networks

May 17, 2025

Vahid Shahverdi, Giovanni Luca Marchetti, Kathlén Kohn

Abstract:Deep neural networks often infer sparse representations, converging to a subnetwork during the learning process. In this work, we theoretically analyze subnetworks and their bias through the lens of algebraic geometry. We consider fully-connected networks with polynomial activation functions, and focus on the geometry of the function space they parametrize, often referred to as neuromanifold. First, we compute the dimension of the subspace of the neuromanifold parametrized by subnetworks. Second, we show that this subspace is singular. Third, we argue that such singularities often correspond to critical points of the training dynamics. Lastly, we discuss convolutional networks, for which subnetworks and singularities are similarly related, but the bias does not arise.

Via

Access Paper or Ask Questions

An Invitation to Neuroalgebraic Geometry

Jan 31, 2025

Giovanni Luca Marchetti, Vahid Shahverdi, Stefano Mereta, Matthew Trager, Kathlén Kohn

Figure 1 for An Invitation to Neuroalgebraic Geometry

Figure 2 for An Invitation to Neuroalgebraic Geometry

Figure 3 for An Invitation to Neuroalgebraic Geometry

Figure 4 for An Invitation to Neuroalgebraic Geometry

Abstract:In this expository work, we promote the study of function spaces parameterized by machine learning models through the lens of algebraic geometry. To this end, we focus on algebraic models, such as neural networks with polynomial activations, whose associated function spaces are semi-algebraic varieties. We outline a dictionary between algebro-geometric invariants of these varieties, such as dimension, degree, and singularities, and fundamental aspects of machine learning, such as sample complexity, expressivity, training dynamics, and implicit bias. Along the way, we review the literature and discuss ideas beyond the algebraic domain. This work lays the foundations of a research direction bridging algebraic geometry and deep learning, that we refer to as neuroalgebraic geometry.

Via

Access Paper or Ask Questions

On the Geometry and Optimization of Polynomial Convolutional Networks

Oct 01, 2024

Vahid Shahverdi, Giovanni Luca Marchetti, Kathlén Kohn

Figure 1 for On the Geometry and Optimization of Polynomial Convolutional Networks

Figure 2 for On the Geometry and Optimization of Polynomial Convolutional Networks

Figure 3 for On the Geometry and Optimization of Polynomial Convolutional Networks

Figure 4 for On the Geometry and Optimization of Polynomial Convolutional Networks

Abstract:We study convolutional neural networks with monomial activation functions. Specifically, we prove that their parameterization map is regular and is an isomorphism almost everywhere, up to rescaling the filters. By leveraging on tools from algebraic geometry, we explore the geometric properties of the image in function space of this map -- typically referred to as neuromanifold. In particular, we compute the dimension and the degree of the neuromanifold, which measure the expressivity of the model, and describe its singularities. Moreover, for a generic large dataset, we derive an explicit formula that quantifies the number of critical points arising in the optimization of a regression loss.

Via

Access Paper or Ask Questions

Algebraic Complexity and Neurovariety of Linear Convolutional Networks

Jan 29, 2024

Vahid Shahverdi

Abstract:In this paper, we study linear convolutional networks with one-dimensional filters and arbitrary strides. The neuromanifold of such a network is a semialgebraic set, represented by a space of polynomials admitting specific factorizations. Introducing a recursive algorithm, we generate polynomial equations whose common zero locus corresponds to the Zariski closure of the corresponding neuromanifold. Furthermore, we explore the algebraic complexity of training these networks employing tools from metric algebraic geometry. Our findings reveal that the number of all complex critical points in the optimization of such a network is equal to the generic Euclidean distance degree of a Segre variety. Notably, this count significantly surpasses the number of critical points encountered in the training of a fully connected linear network with the same number of parameters.

Via

Access Paper or Ask Questions

Geometry of Linear Neural Networks: Equivariance and Invariance under Permutation Groups

Sep 24, 2023

Kathlén Kohn, Anna-Laura Sattelberger, Vahid Shahverdi

Abstract:The set of functions parameterized by a linear fully-connected neural network is a determinantal variety. We investigate the subvariety of functions that are equivariant or invariant under the action of a permutation group. Examples of such group actions are translations or $90^\circ$ rotations on images. For such equivariant or invariant subvarieties, we provide an explicit description of their dimension, their degree as well as their Euclidean distance degree, and their singularities. We fully characterize invariance for arbitrary permutation groups, and equivariance for cyclic groups. We draw conclusions for the parameterization and the design of equivariant and invariant linear networks, such as a weight sharing property, and we prove that all invariant linear functions can be learned by linear autoencoders.

* 24 pages, 2 figures, comments welcome!

Via

Access Paper or Ask Questions

Function Space and Critical Points of Linear Convolutional Networks

Apr 12, 2023

Kathlén Kohn, Guido Montúfar, Vahid Shahverdi, Matthew Trager

Abstract:We study the geometry of linear networks with one-dimensional convolutional layers. The function spaces of these networks can be identified with semi-algebraic families of polynomials admitting sparse factorizations. We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points. We also describe the critical points of the network's parameterization map. Furthermore, we study the optimization problem of training a network with the squared error loss. We prove that for architectures where all strides are larger than one and generic data, the non-zero critical points of that optimization problem are smooth interior points of the function space. This property is known to be false for dense linear networks and linear convolutional networks with stride one.

* 33 pages, 1 figure, 1 table

Via

Access Paper or Ask Questions