Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Holger Rauhut

The Riemannian Geometry associated to Gradient Flows of Linear Convolutional Networks

Jul 08, 2025

El Mehdi Achour, Kathlén Kohn, Holger Rauhut

Abstract:We study geometric properties of the gradient flow for learning deep linear convolutional networks. For linear fully connected networks, it has been shown recently that the corresponding gradient flow on parameter space can be written as a Riemannian gradient flow on function space (i.e., on the product of weight matrices) if the initialization satisfies a so-called balancedness condition. We establish that the gradient flow on parameter space for learning linear convolutional networks can be written as a Riemannian gradient flow on function space regardless of the initialization. This result holds for $D$-dimensional convolutions with $D \geq 2$, and for $D =1$ it holds if all so-called strides of the convolutions are greater than one. The corresponding Riemannian metric depends on the initialization.

Via

Access Paper or Ask Questions

With or Without Replacement? Improving Confidence in Fourier Imaging

Jul 18, 2024

Frederik Hoppe, Claudio Mayrink Verdun, Felix Krahmer, Marion I. Menzel, Holger Rauhut

Figure 1 for With or Without Replacement? Improving Confidence in Fourier Imaging

Figure 2 for With or Without Replacement? Improving Confidence in Fourier Imaging

Figure 3 for With or Without Replacement? Improving Confidence in Fourier Imaging

Figure 4 for With or Without Replacement? Improving Confidence in Fourier Imaging

Abstract:Over the last few years, debiased estimators have been proposed in order to establish rigorous confidence intervals for high-dimensional problems in machine learning and data science. The core argument is that the error of these estimators with respect to the ground truth can be expressed as a Gaussian variable plus a remainder term that vanishes as long as the dimension of the problem is sufficiently high. Thus, uncertainty quantification (UQ) can be performed exploiting the Gaussian model. Empirically, however, the remainder term cannot be neglected in many realistic situations of moderately-sized dimensions, in particular in certain structured measurement scenarios such as Magnetic Resonance Imaging (MRI). This, in turn, can downgrade the advantage of the UQ methods as compared to non-UQ approaches such as the standard LASSO. In this paper, we present a method to improve the debiased estimator by sampling without replacement. Our approach leverages recent results of ours on the structure of the random nature of certain sampling schemes showing how a transition between sampling with and without replacement can lead to a weighted reconstruction scheme with improved performance for the standard LASSO. In this paper, we illustrate how this reweighted sampling idea can also improve the debiased estimator and, consequently, provide a better method for UQ in Fourier imaging.

* Accepted at Cosera 2024

Via

Access Paper or Ask Questions

High-Dimensional Confidence Regions in Sparse MRI

Jul 18, 2024

Frederik Hoppe, Felix Krahmer, Claudio Mayrink Verdun, Marion Menzel, Holger Rauhut

Figure 1 for High-Dimensional Confidence Regions in Sparse MRI

Figure 2 for High-Dimensional Confidence Regions in Sparse MRI

Figure 3 for High-Dimensional Confidence Regions in Sparse MRI

Abstract:One of the most promising solutions for uncertainty quantification in high-dimensional statistics is the debiased LASSO that relies on unconstrained $\ell_1$-minimization. The initial works focused on real Gaussian designs as a toy model for this problem. However, in medical imaging applications, such as compressive sensing for MRI, the measurement system is represented by a (subsampled) complex Fourier matrix. The purpose of this work is to extend the method to the MRI case in order to construct confidence intervals for each pixel of an MR image. We show that a sufficient amount of data is $n \gtrsim \max\{ s_0\log^2 s_0\log p, s_0 \log^2 p \}$.

* Recognized with Best Student Paper Award at ICASSP 2023. arXiv admin note: substantial text overlap with arXiv:2212.14864

Via

Access Paper or Ask Questions

Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning

Jul 18, 2024

Frederik Hoppe, Claudio Mayrink Verdun, Hannah Laus, Felix Krahmer, Holger Rauhut

Figure 1 for Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning

Figure 2 for Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning

Figure 3 for Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning

Figure 4 for Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning

Abstract:Uncertainty quantification (UQ) is a crucial but challenging task in many high-dimensional regression or learning problems to increase the confidence of a given predictor. We develop a new data-driven approach for UQ in regression that applies both to classical regression approaches such as the LASSO as well as to neural networks. One of the most notable UQ techniques is the debiased LASSO, which modifies the LASSO to allow for the construction of asymptotic confidence intervals by decomposing the estimation error into a Gaussian and an asymptotically vanishing bias component. However, in real-world problems with finite-dimensional data, the bias term is often too significant to be neglected, resulting in overly narrow confidence intervals. Our work rigorously addresses this issue and derives a data-driven adjustment that corrects the confidence intervals for a large class of predictors by estimating the means and variances of the bias terms from training data, exploiting high-dimensional concentration phenomena. This gives rise to non-asymptotic confidence intervals, which can help avoid overestimating uncertainty in critical applications such as MRI diagnosis. Importantly, our analysis extends beyond sparse regression to data-driven predictors like neural networks, enhancing the reliability of model-based deep learning. Our findings bridge the gap between established theory and the practical applicability of such debiased methods.

Via

Access Paper or Ask Questions

Uncertainty quantification for learned ISTA

Sep 14, 2023

Frederik Hoppe, Claudio Mayrink Verdun, Felix Krahmer, Hannah Laus, Holger Rauhut

Figure 1 for Uncertainty quantification for learned ISTA

Figure 2 for Uncertainty quantification for learned ISTA

Figure 3 for Uncertainty quantification for learned ISTA

Abstract:Model-based deep learning solutions to inverse problems have attracted increasing attention in recent years as they bridge state-of-the-art numerical performance with interpretability. In addition, the incorporated prior domain knowledge can make the training more efficient as the smaller number of parameters allows the training step to be executed with smaller datasets. Algorithm unrolling schemes stand out among these model-based learning techniques. Despite their rapid advancement and their close connection to traditional high-dimensional statistical methods, they lack certainty estimates and a theory for uncertainty quantification is still elusive. This work provides a step towards closing this gap proposing a rigorous way to obtain confidence intervals for the LISTA estimator.

* to appear at the 33rd IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2023)

Via

Access Paper or Ask Questions

Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models

Jun 22, 2023

Leonardo Galli, Holger Rauhut, Mark Schmidt

Figure 1 for Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models

Figure 2 for Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models

Figure 3 for Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models

Figure 4 for Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models

Abstract:Recent works have shown that line search methods can speed up Stochastic Gradient Descent (SGD) and Adam in modern over-parameterized settings. However, existing line searches may take steps that are smaller than necessary since they require a monotone decrease of the (mini-)batch objective function. We explore nonmonotone line search methods to relax this condition and possibly accept larger step sizes. Despite the lack of a monotonic decrease, we prove the same fast rates of convergence as in the monotone case. Our experiments show that nonmonotone methods improve the speed of convergence and generalization properties of SGD/Adam even beyond the previous monotone line searches. We propose a POlyak NOnmonotone Stochastic (PoNoS) method, obtained by combining a nonmonotone line search with a Polyak initial step size. Furthermore, we develop a new resetting technique that in the majority of the iterations reduces the amount of backtracks to zero while still maintaining a large initial step size. To the best of our knowledge, a first runtime comparison shows that the epoch-wise advantage of line-search-based methods gets reflected in the overall computational time.

Via

Access Paper or Ask Questions

Robust Implicit Regularization via Weight Normalization

May 09, 2023

Hung-Hsu Chou, Holger Rauhut, Rachel Ward

Figure 1 for Robust Implicit Regularization via Weight Normalization

Figure 2 for Robust Implicit Regularization via Weight Normalization

Figure 3 for Robust Implicit Regularization via Weight Normalization

Figure 4 for Robust Implicit Regularization via Weight Normalization

Abstract:Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line of work has shown that (stochastic) gradient descent tends to have an implicit bias towards low rank and/or sparse solutions when used to train deep linear networks, explaining to some extent why overparameterized neural network models trained by gradient descent tend to have good generalization performance in practice. However, existing theory for square-loss objectives often requires very small initialization of the trainable weights, which is at odds with the larger scale at which weights are initialized in practice for faster convergence and better generalization performance. In this paper, we aim to close this gap by incorporating and analyzing gradient descent with weight normalization, where the weight vector is reparamterized in terms of polar coordinates, and gradient descent is applied to the polar coordinates. By analyzing key invariants of the gradient flow and using Lojasiewicz's Theorem, we show that weight normalization also has an implicit bias towards sparse solutions in the diagonal linear model, but that in contrast to plain gradient descent, weight normalization enables a robust bias that persists even if the weights are initialized at practically large scale. Experiments suggest that the gains in both convergence speed and robustness of the implicit bias are improved dramatically by using weight normalization in overparameterized diagonal linear network models.

Via

Access Paper or Ask Questions

More is Less: Inducing Sparsity via Overparameterization

Dec 21, 2021

Hung-Hsu Chou, Johannes Maly, Holger Rauhut

Figure 1 for More is Less: Inducing Sparsity via Overparameterization

Figure 2 for More is Less: Inducing Sparsity via Overparameterization

Figure 3 for More is Less: Inducing Sparsity via Overparameterization

Figure 4 for More is Less: Inducing Sparsity via Overparameterization

Abstract:In deep learning it is common to overparameterize the neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very well, while classical statistics would suggest overfitting. In order to gain understanding of this implicit bias phenomenon we study the special case of sparse recovery (compressive sensing) which is of interest on its own. More precisely, in order to reconstruct a vector from underdetermined linear measurements, we introduce a corresponding overparameterized square loss functional, where the vector to be reconstructed is deeply factorized into several vectors. We show that, under a very mild assumption on the measurement matrix, vanilla gradient flow for the overparameterized loss functional converges to a solution of minimal $\ell_1$-norm. The latter is well-known to promote sparse solutions. As a by-product, our results significantly improve the sample complexity for compressive sensing in previous works. The theory accurately predicts the recovery rate in numerical experiments. For the proofs, we introduce the concept of {\textit{solution entropy}}, which bypasses the obstacles caused by non-convexity and should be of independent interest.

Via

Access Paper or Ask Questions

Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks

Dec 08, 2021

Ekkehard Schnoor, Arash Behboodi, Holger Rauhut

Figure 1 for Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks

Figure 2 for Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks

Figure 3 for Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks

Figure 4 for Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks

Abstract:Motivated by the learned iterative soft thresholding algorithm (LISTA), we introduce a general class of neural networks suitable for sparse reconstruction from few linear measurements. By allowing a wide range of degrees of weight-sharing between the layers, we enable a unified analysis for very different neural network types, ranging from recurrent ones to networks more similar to standard feedforward neural networks. Based on training samples, via empirical risk minimization we aim at learning the optimal network parameters and thereby the optimal network that reconstructs signals from their low-dimensional linear measurements. We derive generalization bounds by analyzing the Rademacher complexity of hypothesis classes consisting of such deep networks, that also take into account the thresholding parameters. We obtain estimates of the sample complexity that essentially depend only linearly on the number of parameters and on the depth. We apply our main result to obtain specific generalization bounds for several practical examples, including different algorithms for (implicit) dictionary learning, and convolutional neural networks.

* 29 pages, 6 figures

Via

Access Paper or Ask Questions

Spark Deficient Gabor Frames for Inverse Problems

Oct 13, 2021

Vasiliki Kouni, Holger Rauhut

Figure 1 for Spark Deficient Gabor Frames for Inverse Problems

Figure 2 for Spark Deficient Gabor Frames for Inverse Problems

Figure 3 for Spark Deficient Gabor Frames for Inverse Problems

Abstract:In this paper, we apply star-Digital Gabor Transform in analysis Compressed Sensing and speech denoising. Based on assumptions on the ambient dimension, we produce a window vector that generates a spark deficient Gabor frame with many linear dependencies among its elements. We conduct computational experiments on both synthetic and real-world signals, using as baseline three Gabor transforms generated by state-of-the-art window vectors and compare their performance to star-Gabor transform. Results show that the proposed star-Gabor transform outperforms all others in all signal cases.

* 2021 Online International Conference on Computational Harmonic Analysis (Online-ICCHA2021)

Via

Access Paper or Ask Questions