Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maren Mahsereci

Connecting Parameter Magnitudes and Hessian Eigenspaces at Scale using Sketched Methods

Apr 20, 2025

Andres Fernandez, Frank Schneider, Maren Mahsereci, Philipp Hennig

Abstract:Recently, it has been observed that when training a deep neural net with SGD, the majority of the loss landscape's curvature quickly concentrates in a tiny *top* eigenspace of the loss Hessian, which remains largely stable thereafter. Independently, it has been shown that successful magnitude pruning masks for deep neural nets emerge early in training and remain stable thereafter. In this work, we study these two phenomena jointly and show that they are connected: We develop a methodology to measure the similarity between arbitrary parameter masks and Hessian eigenspaces via Grassmannian metrics. We identify *overlap* as the most useful such metric due to its interpretability and stability. To compute *overlap*, we develop a matrix-free algorithm based on sketched SVDs that allows us to compute over 1000 Hessian eigenpairs for nets with over 10M parameters --an unprecedented scale by several orders of magnitude. Our experiments reveal an *overlap* between magnitude parameter masks and top Hessian eigenspaces consistently higher than chance-level, and that this effect gets accentuated for larger network sizes. This result indicates that *top Hessian eigenvectors tend to be concentrated around larger parameters*, or equivalently, that *larger parameters tend to align with directions of larger loss curvature*. Our work provides a methodology to approximate and analyze deep learning Hessians at scale, as well as a novel insight on the structure of their eigenspace.

* Accepted at TMLR 2025

Via

Access Paper or Ask Questions

ProbNum: Probabilistic Numerics in Python

Dec 03, 2021

Jonathan Wenger, Nicholas Krämer, Marvin Pförtner, Jonathan Schmidt, Nathanael Bosch, Nina Effenberger, Johannes Zenn, Alexandra Gessner, Toni Karvonen, François-Xavier Briol(+2 more)

Figure 1 for ProbNum: Probabilistic Numerics in Python

Figure 2 for ProbNum: Probabilistic Numerics in Python

Figure 3 for ProbNum: Probabilistic Numerics in Python

Abstract:Probabilistic numerical methods (PNMs) solve numerical problems via probabilistic inference. They have been developed for linear algebra, optimization, integration and differential equation simulation. PNMs naturally incorporate prior information about a problem and quantify uncertainty due to finite computational resources as well as stochastic input. In this paper, we present ProbNum: a Python library providing state-of-the-art probabilistic numerical solvers. ProbNum enables custom composition of PNMs for specific problem classes via a modular design as well as wrappers for off-the-shelf use. Tutorials, documentation, developer guides and benchmarks are available online at www.probnum.org.

Via

Access Paper or Ask Questions

Invariant Priors for Bayesian Quadrature

Dec 02, 2021

Masha Naslidnyk, Javier Gonzalez, Maren Mahsereci

Figure 1 for Invariant Priors for Bayesian Quadrature

Figure 2 for Invariant Priors for Bayesian Quadrature

Figure 3 for Invariant Priors for Bayesian Quadrature

Figure 4 for Invariant Priors for Bayesian Quadrature

Abstract:Bayesian quadrature (BQ) is a model-based numerical integration method that is able to increase sample efficiency by encoding and leveraging known structure of the integration task at hand. In this paper, we explore priors that encode invariance of the integrand under a set of bijective transformations in the input domain, in particular some unitary transformations, such as rotations, axis-flips, or point symmetries. We show initial results on superior performance in comparison to standard Bayesian quadrature on several synthetic and one real world application.

Via

Access Paper or Ask Questions

Emulation of physical processes with Emukit

Oct 25, 2021

Andrei Paleyes, Mark Pullin, Maren Mahsereci, Cliff McCollum, Neil D. Lawrence, Javier Gonzalez

Figure 1 for Emulation of physical processes with Emukit

Figure 2 for Emulation of physical processes with Emukit

Figure 3 for Emulation of physical processes with Emukit

Figure 4 for Emulation of physical processes with Emukit

Abstract:Decision making in uncertain scenarios is an ubiquitous challenge in real world systems. Tools to deal with this challenge include simulations to gather information and statistical emulation to quantify uncertainty. The machine learning community has developed a number of methods to facilitate decision making, but so far they are scattered in multiple different toolkits, and generally rely on a fixed backend. In this paper, we present Emukit, a highly adaptable Python toolkit for enriching decision making under uncertainty. Emukit allows users to: (i) use state of the art methods including Bayesian optimization, multi-fidelity emulation, experimental design, Bayesian quadrature and sensitivity analysis; (ii) easily prototype new decision making methods for new problems. Emukit is agnostic to the underlying modeling framework and enables users to use their own custom models. We show how Emukit can be used on three exemplary case studies.

* Second Workshop on Machine Learning and the Physical Sciences, NeurIPS 2019

Via

Access Paper or Ask Questions

A Fourier State Space Model for Bayesian ODE Filters

Jul 22, 2020

Hans Kersting, Maren Mahsereci

Figure 1 for A Fourier State Space Model for Bayesian ODE Filters

Figure 2 for A Fourier State Space Model for Bayesian ODE Filters

Abstract:Gaussian ODE filtering is a probabilistic numerical method to solve ordinary differential equations (ODEs). It computes a Bayesian posterior over the solution from evaluations of the vector field defining the ODE. Its most popular version, which employs an integrated Brownian motion prior, uses Taylor expansions of the mean to extrapolate forward and has the same convergence rates as classical numerical methods. As the solution of many important ODEs are periodic functions (oscillators), we raise the question whether Fourier expansions can also be brought to bear within the framework of Gaussian ODE filtering. To this end, we construct a Fourier state space model for ODEs and a `hybrid' model that combines a Taylor (Brownian motion) and Fourier state space model. We show by experiments how the hybrid model might become useful in cheaply predicting until the end of the time domain.

* 5 pages, 2 figures, ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models

Via

Access Paper or Ask Questions

Active Multi-Information Source Bayesian Quadrature

Mar 27, 2019

Alexandra Gessner, Javier Gonzalez, Maren Mahsereci

Figure 1 for Active Multi-Information Source Bayesian Quadrature

Figure 2 for Active Multi-Information Source Bayesian Quadrature

Figure 3 for Active Multi-Information Source Bayesian Quadrature

Figure 4 for Active Multi-Information Source Bayesian Quadrature

Abstract:Bayesian quadrature (BQ) is a sample-efficient probabilistic numerical method to solve integrals of expensive-to-evaluate black-box functions, yet so far,active BQ learning schemes focus merely on the integrand itself as information source, and do not allow for information transfer from cheaper, related functions. Here, we set the scene for active learning in BQ when multiple related information sources of variable cost (in input and source) are accessible. This setting arises for example when evaluating the integrand requires a complex simulation to be run that can be approximated by simulating at lower levels of sophistication and at lesser expense. We construct meaningful cost-sensitive multi-source acquisition rates as an extension to common utility functions from vanilla BQ (VBQ),and discuss pitfalls that arise from blindly generalizing. Furthermore, we show that the VBQ acquisition policy is a corner-case of all considered cost-sensitive acquisition schemes, which collapse onto one single de-generate policy in the case of one source and constant cost. In proof-of-concept experiments we scrutinize the behavior of our generalized acquisition functions. On an epidemiological model, we demonstrate that active multi-source BQ (AMS-BQ) allocates budget more efficiently than VBQ for learning the integral to a good accuracy.

Via

Access Paper or Ask Questions

Probabilistic Line Searches for Stochastic Optimization

Jun 30, 2017

Maren Mahsereci, Philipp Hennig

Figure 1 for Probabilistic Line Searches for Stochastic Optimization

Figure 2 for Probabilistic Line Searches for Stochastic Optimization

Figure 3 for Probabilistic Line Searches for Stochastic Optimization

Figure 4 for Probabilistic Line Searches for Stochastic Optimization

Abstract:In deterministic optimization, line searches are a standard tool ensuring stability and efficiency. Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic line search by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent. The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent.

* Extended version of the NIPS '15 conference paper, includes detailed pseudo-code, 59 pages, 35 figures

Via

Access Paper or Ask Questions

Early Stopping without a Validation Set

Jun 06, 2017

Maren Mahsereci, Lukas Balles, Christoph Lassner, Philipp Hennig

Figure 1 for Early Stopping without a Validation Set

Figure 2 for Early Stopping without a Validation Set

Figure 3 for Early Stopping without a Validation Set

Figure 4 for Early Stopping without a Validation Set

Abstract:Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. We propose a novel early stopping criterion based on fast-to-compute local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression, as well as neural networks.

* 16 pages, 10 figures

Via

Access Paper or Ask Questions