Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andres Fernandez

Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem

May 30, 2025

Andres Fernandez, Juan Azcarreta, Cagdas Bilen, Jesus Monge Alvarez

Abstract:Recent work in online speech spectrogram inversion effectively combines Deep Learning with the Gradient Theorem to predict phase derivatives directly from magnitudes. Then, phases are estimated from their derivatives via least squares, resulting in a high quality reconstruction. In this work, we introduce three innovations that drastically reduce computational cost, while maintaining high quality: Firstly, we introduce a novel neural network architecture with just 8k parameters, 30 times smaller than previous state of the art. Secondly, increasing latency by 1 hop size allows us to further halve the cost of the neural inference step. Thirdly, we we observe that the least squares problem features a tridiagonal matrix and propose a linear-complexity solver for the least squares step that leverages tridiagonality and positive-semidefiniteness, achieving a speedup of several orders of magnitude. We release samples online.

* Accepted at InterSpeech 2025

Via

Access Paper or Ask Questions

Connecting Parameter Magnitudes and Hessian Eigenspaces at Scale using Sketched Methods

Apr 20, 2025

Andres Fernandez, Frank Schneider, Maren Mahsereci, Philipp Hennig

Abstract:Recently, it has been observed that when training a deep neural net with SGD, the majority of the loss landscape's curvature quickly concentrates in a tiny *top* eigenspace of the loss Hessian, which remains largely stable thereafter. Independently, it has been shown that successful magnitude pruning masks for deep neural nets emerge early in training and remain stable thereafter. In this work, we study these two phenomena jointly and show that they are connected: We develop a methodology to measure the similarity between arbitrary parameter masks and Hessian eigenspaces via Grassmannian metrics. We identify *overlap* as the most useful such metric due to its interpretability and stability. To compute *overlap*, we develop a matrix-free algorithm based on sketched SVDs that allows us to compute over 1000 Hessian eigenpairs for nets with over 10M parameters --an unprecedented scale by several orders of magnitude. Our experiments reveal an *overlap* between magnitude parameter masks and top Hessian eigenspaces consistently higher than chance-level, and that this effect gets accentuated for larger network sizes. This result indicates that *top Hessian eigenvectors tend to be concentrated around larger parameters*, or equivalently, that *larger parameters tend to align with directions of larger loss curvature*. Our work provides a methodology to approximate and analyze deep learning Hessians at scale, as well as a novel insight on the structure of their eigenspace.

* Accepted at TMLR 2025

Via

Access Paper or Ask Questions

Position: Curvature Matrices Should Be Democratized via Linear Operators

Jan 31, 2025

Felix Dangel, Runa Eschenhagen, Weronika Ormaniec, Andres Fernandez, Lukas Tatzel, Agustinus Kristiadi

Figure 1 for Position: Curvature Matrices Should Be Democratized via Linear Operators

Figure 2 for Position: Curvature Matrices Should Be Democratized via Linear Operators

Figure 3 for Position: Curvature Matrices Should Be Democratized via Linear Operators

Figure 4 for Position: Curvature Matrices Should Be Democratized via Linear Operators

Abstract:Structured large matrices are prevalent in machine learning. A particularly important class is curvature matrices like the Hessian, which are central to understanding the loss landscape of neural nets (NNs), and enable second-order optimization, uncertainty quantification, model pruning, data attribution, and more. However, curvature computations can be challenging due to the complexity of automatic differentiation, and the variety and structural assumptions of curvature proxies, like sparsity and Kronecker factorization. In this position paper, we argue that linear operators -- an interface for performing matrix-vector products -- provide a general, scalable, and user-friendly abstraction to handle curvature matrices. To support this position, we developed $\textit{curvlinops}$, a library that provides curvature matrices through a unified linear operator interface. We demonstrate with $\textit{curvlinops}$ how this interface can hide complexity, simplify applications, be extensible and interoperable with other libraries, and scale to large NNs.

* 8 pages, 2 figures

Via

Access Paper or Ask Questions

Onsets and Velocities: Affordable Real-Time Piano Transcription Using Convolutional Neural Networks

Mar 08, 2023

Andres Fernandez

Abstract:Polyphonic Piano Transcription has recently experienced substantial progress driven by the application of sophisticated Deep Learning setups and the introduction of new subtasks such as note onset, offset, velocity and pedal detection. In this work, we focus on onset and velocity detection, presenting a convolutional neural network with substantially reduced size (3.1M parameters) and a simple inference scheme that achieves state-of-the-art performance on the MAESTRO dataset for onset detection (F1=96.78%) and sets a good novel baseline for onset+velocity (F1=94.50%), while maintaining real-time capabilities on modest commodity hardware. Furthermore, our proposed ONSETS&VELOCITIES (O&V) model shows that a time resolution of 24ms is competitive, countering recent trends. We provide open-source software to reproduce our results and a real-time demo with a pretrained model.

* Work supported by the Institut d'Estudis Balearics, Balearic Islands

Via

Access Paper or Ask Questions

Using UMAP to Inspect Audio Data for Unsupervised Anomaly Detection under Domain-Shift Conditions

Jul 22, 2021

Andres Fernandez, Mark D. Plumbley

Abstract:The goal of Unsupervised Anomaly Detection (UAD) is to detect anomalous signals under the condition that only non-anomalous (normal) data is available beforehand. In UAD under Domain-Shift Conditions (UAD-S), data is further exposed to contextual changes that are usually unknown beforehand. Motivated by the difficulties encountered in the UAD-S task presented at the 2021 edition of the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge, we visually inspect Uniform Manifold Approximations and Projections (UMAPs) for log-STFT, log-mel and pretrained Look, Listen and Learn (L3) representations of the DCASE UAD-S dataset. In our exploratory investigation, we look for two qualities, Separability (SEP) and Discriminative Support (DSUP), and formulate several hypotheses that could facilitate diagnosis and developement of further representation and detection approaches. Particularly, we hypothesize that input length and pretraining may regulate a relevant tradeoff between SEP and DSUP. Our code as well as the resulting UMAPs and plots are publicly available.

* Submitted for publication

Via

Access Paper or Ask Questions