Abstract:This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of significant improvements. This research broadens the concept of bandwidth extension to \emph{generative equalization}, a novel task that, to the best of our knowledge, has not been explicitly addressed in previous studies. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing a marked enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music.
Abstract:In this work, we consider the problem of localizing multiple signal sources based on time-difference of arrival (TDOA) measurements. In the blind setting, in which the source signals are not known, the localization task is challenging due to the data association problem. That is, it is not known which of the TDOA measurements correspond to the same source. Herein, we propose to perform joint localization and data association by means of an optimal transport formulation. The method operates by finding optimal groupings of TDOA measurements and associating these with candidate source locations. To allow for computationally feasible localization in three-dimensional space, an efficient set of candidate locations is constructed using a minimal multilateration solver based on minimal sets of receiver pairs. In numerical simulations, we demonstrate that the proposed method is robust both to measurement noise and TDOA detection errors. Furthermore, it is shown that the data association provided by the proposed method allows for statistically efficient estimates of the source locations.
Abstract:The ability to accurately estimate room impulse responses (RIRs) is integral to many applications of spatial audio processing. Regrettably, estimating the RIR using ambient signals, such as speech or music, remains a challenging problem due to, e.g., low signal-to-noise ratios, finite sample lengths, and poor spectral excitation. Commonly, in order to improve the conditioning of the estimation problem, priors are placed on the amplitudes of the RIR. Although serving as a regularizer, this type of prior is generally not useful when only approximate knowledge of the delay structure is available, which, for example, is the case when the prior is a simulated RIR from an approximation of the room geometry. In this work, we target the delay structure itself, constructing a prior based on the concept of optimal transport. As illustrated using both simulated and measured data, the resulting method is able to beneficially incorporate information even from simple simulation models, displaying considerable robustness to perturbations in the assumed room dimensions and its temperature.
Abstract:In this work, we introduce an optimal transport framework for inferring power distributions over both spatial location and temporal frequency. Recently, it has been shown that optimal transport is a powerful tool for estimating spatial spectra that change smoothly over time. In this work, we consider the tracking of the spatio-temporal spectrum corresponding to a small number of moving broad-band signal sources. Typically, such tracking problems are addressed by treating the spatio-temporal power distribution in a frequency-by-frequency manner, allowing to use well-understood models for narrow-band signals. This however leads to decreased target resolution due to inefficient use of the available information. We propose an extension of the optimal transport framework that exploits information from several frequencies simultaneously by estimating a spatio-temporal distribution penalized by a group-sparsity regularizer. This approach finds a spatial spectrum that changes smoothly over time, and at each time instance has a small support that is similar across frequencies. To the best of the authors knowledge, this is the first formulation combining optimal transport and sparsity for solving inverse problems. As is shown on simulated and real data, our method can successfully track targets in scenarios where information from separate frequency bands alone is insufficient.
Abstract:Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveraging the generative priors of a pre-trained unconditional diffusion model. During the inference process, BABE utilizes a generalized version of diffusion posterior sampling, where the degradation operator is unknown but parametrized and inferred iteratively. The performance of the proposed method is evaluated using objective and subjective metrics, and the results show that BABE surpasses state-of-the-art blind bandwidth extension baselines and achieves competitive performance compared to non-blind filter-informed methods when tested with synthetic data. Moreover, BABE exhibits robust generalization capabilities when enhancing real historical recordings, effectively reconstructing the missing high-frequency content while maintaining coherence with the original recording. Subjective preference tests confirm that BABE significantly improves the audio quality of historical music recordings. Examples of historical recordings restored with the proposed method are available on the companion webpage: (http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/)
Abstract:Distributed signal-processing algorithms in (wireless) sensor networks often aim to decentralize processing tasks to reduce communication cost and computational complexity or avoid reliance on a single device (i.e., fusion center) for processing. In this contribution, we extend a distributed adaptive algorithm for blind system identification that relies on the estimation of a stacked network-wide consensus vector at each node, the computation of which requires either broadcasting or relaying of node-specific values (i.e., local vector norms) to all other nodes. The extended algorithm employs a distributed-averaging-based scheme to estimate the network-wide consensus norm value by only using the local vector norm provided by neighboring sensor nodes. We introduce an adaptive mixing factor between instantaneous and recursive estimates of these norms for adaptivity in a time-varying system. Simulation results show that the extension provides estimation results close to the optimal fully-connected-network or broadcasting case while reducing inter-node transmission significantly.
Abstract:In this short paper, we describe an efficient numerical solver for the optimal sampling problem considered in "Designing Sampling Schemes for Multi-Dimensional Data". An implementation may be found on https://www.maths.lu.se/staff/andreas-jakobsson/publications/.
Abstract:In this work, we consider the problem of bounding the values of a covariance function corresponding to a continuous-time stationary stochastic process or signal. Specifically, for two signals whose covariance functions agree on a finite discrete set of time-lags, we consider the maximal possible discrepancy of the covariance functions for real-valued time-lags outside this discrete grid. Computing this uncertainty corresponds to solving an infinite dimensional non-convex problem. However, we herein prove that the maximal objective value may be bounded from above by a finite dimensional convex optimization problem, allowing for efficient computation by standard methods. Furthermore, we empirically observe that for the case of signals whose spectra are supported on an interval, this upper bound is sharp, i.e., provides an exact quantification of the covariance uncertainty.
Abstract:In this work, we introduce a novel approach for determining a joint sparse spectrum from several non-uniformly sampled data sets, where each data set is assumed to have its own, possibly disjoint, and only partially known, sampling times. The potential of the proposed approach is illustrated using a spectral estimation problem in paleoclimatology. In this problem, each data point derives from a separate ice core measurement, resulting in that even though all measurements reflect the same periodicities, the sampling times and phases differ among the data sets. In addition, sampling times are only approximately known. The resulting joint estimate exploiting all available data is formulated using a sparse reconstruction framework allowing for a reliable and robust estimate of the underlying periodicities. The corresponding misspecified Cram\'er-Rao lower bound, accounting for the expected sampling uncertainties, is derived and the proposed method is shown to attain the resulting bound when the signal to noise ratio is sufficiently high. The performance of the proposed method is illustrated as compared to other commonly used approaches using both simulated and measured ice core data sets.
Abstract:The estimation of the covariance function of a stochastic process, or signal, is of integral importance for a multitude of signal processing applications. In this work, we derive closed-form expressions for the variance of covariance estimates for mixed-spectrum signals, i.e., spectra containing both absolutely continuous and singular parts. The results cover both finite-sample and asymptotic regimes, allowing for assessing the exact speed of convergence of estimates to their expectations, as well as their limiting behavior. As is shown, such covariance estimates may converge even for non-ergodic processes. Furthermore, we consider approximating signals with arbitrary spectral densities by sequences of singular spectrum, i.e., sinusoidal, processes, and derive the limiting behavior of covariance estimates as both the sample size and the number of sinusoidal components tend to infinity. We show that the asymptotic regime variance can be described by a time-frequency resolution product, with dramatically different behavior depending on how the sinusoidal approximation is constructed. In a few numerical examples we illustrate the theory and the corresponding implications for direction of arrival estimation.