Abstract:Separating sources is a common challenge in applications such as speech enhancement and telecommunications, where distinguishing between overlapping sounds helps reduce interference and improve signal quality. Additionally, in multichannel systems, correct calibration and synchronization are essential to separate and locate source signals accurately. This work introduces a method for blind source separation and estimation of the Time Difference of Arrival (TDOA) of signals in the time-frequency domain. Our proposed method effectively separates signal mixtures into their original source spectrograms while simultaneously estimating the relative delays between receivers, using Optimal Transport (OT) theory. By exploiting the structure of the OT problem, we combine the separation and delay estimation processes into a unified framework, optimizing the system through a block coordinate descent algorithm. We analyze the performance of the OT-based estimator under various noise conditions and compare it with conventional TDOA and source separation methods. Numerical simulation results demonstrate that our proposed approach can achieve a significant level of accuracy across diverse noise scenarios for physical speech signals in both TDOA and source separation tasks.
Abstract:In this work, we consider the problem of jointly estimating a set of room impulse responses (RIRs) corresponding to closely spaced microphones. The accurate estimation of RIRs is crucial in acoustic applications such as speech enhancement, noise cancellation, and auralization. However, real-world constraints such as short excitation signals, low signal-to-noise ratios, and poor spectral excitation, often render the estimation problem ill-posed. In this paper, we address these challenges by means of optimal mass transport (OMT) regularization. In particular, we propose to use an OMT barycenter, or generalized mean, as a mechanism for information sharing between the microphones. This allows us to quantify and exploit similarities in the delay-structures between the different microphones without having to impose rigid assumptions on the room acoustics. The resulting estimator is formulated in terms of the solution to a convex optimization problem which can be implemented using standard solvers. In numerical examples, we demonstrate the potential of the proposed method in addressing otherwise ill-conditioned estimation scenarios.
Abstract:This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of significant improvements. This research broadens the concept of bandwidth extension to \emph{generative equalization}, a novel task that, to the best of our knowledge, has not been explicitly addressed in previous studies. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing a marked enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music.
Abstract:In this work, we consider the problem of localizing multiple signal sources based on time-difference of arrival (TDOA) measurements. In the blind setting, in which the source signals are not known, the localization task is challenging due to the data association problem. That is, it is not known which of the TDOA measurements correspond to the same source. Herein, we propose to perform joint localization and data association by means of an optimal transport formulation. The method operates by finding optimal groupings of TDOA measurements and associating these with candidate source locations. To allow for computationally feasible localization in three-dimensional space, an efficient set of candidate locations is constructed using a minimal multilateration solver based on minimal sets of receiver pairs. In numerical simulations, we demonstrate that the proposed method is robust both to measurement noise and TDOA detection errors. Furthermore, it is shown that the data association provided by the proposed method allows for statistically efficient estimates of the source locations.
Abstract:The ability to accurately estimate room impulse responses (RIRs) is integral to many applications of spatial audio processing. Regrettably, estimating the RIR using ambient signals, such as speech or music, remains a challenging problem due to, e.g., low signal-to-noise ratios, finite sample lengths, and poor spectral excitation. Commonly, in order to improve the conditioning of the estimation problem, priors are placed on the amplitudes of the RIR. Although serving as a regularizer, this type of prior is generally not useful when only approximate knowledge of the delay structure is available, which, for example, is the case when the prior is a simulated RIR from an approximation of the room geometry. In this work, we target the delay structure itself, constructing a prior based on the concept of optimal transport. As illustrated using both simulated and measured data, the resulting method is able to beneficially incorporate information even from simple simulation models, displaying considerable robustness to perturbations in the assumed room dimensions and its temperature.
Abstract:In this work, we introduce an optimal transport framework for inferring power distributions over both spatial location and temporal frequency. Recently, it has been shown that optimal transport is a powerful tool for estimating spatial spectra that change smoothly over time. In this work, we consider the tracking of the spatio-temporal spectrum corresponding to a small number of moving broad-band signal sources. Typically, such tracking problems are addressed by treating the spatio-temporal power distribution in a frequency-by-frequency manner, allowing to use well-understood models for narrow-band signals. This however leads to decreased target resolution due to inefficient use of the available information. We propose an extension of the optimal transport framework that exploits information from several frequencies simultaneously by estimating a spatio-temporal distribution penalized by a group-sparsity regularizer. This approach finds a spatial spectrum that changes smoothly over time, and at each time instance has a small support that is similar across frequencies. To the best of the authors knowledge, this is the first formulation combining optimal transport and sparsity for solving inverse problems. As is shown on simulated and real data, our method can successfully track targets in scenarios where information from separate frequency bands alone is insufficient.
Abstract:Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveraging the generative priors of a pre-trained unconditional diffusion model. During the inference process, BABE utilizes a generalized version of diffusion posterior sampling, where the degradation operator is unknown but parametrized and inferred iteratively. The performance of the proposed method is evaluated using objective and subjective metrics, and the results show that BABE surpasses state-of-the-art blind bandwidth extension baselines and achieves competitive performance compared to non-blind filter-informed methods when tested with synthetic data. Moreover, BABE exhibits robust generalization capabilities when enhancing real historical recordings, effectively reconstructing the missing high-frequency content while maintaining coherence with the original recording. Subjective preference tests confirm that BABE significantly improves the audio quality of historical music recordings. Examples of historical recordings restored with the proposed method are available on the companion webpage: (http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/)
Abstract:Distributed signal-processing algorithms in (wireless) sensor networks often aim to decentralize processing tasks to reduce communication cost and computational complexity or avoid reliance on a single device (i.e., fusion center) for processing. In this contribution, we extend a distributed adaptive algorithm for blind system identification that relies on the estimation of a stacked network-wide consensus vector at each node, the computation of which requires either broadcasting or relaying of node-specific values (i.e., local vector norms) to all other nodes. The extended algorithm employs a distributed-averaging-based scheme to estimate the network-wide consensus norm value by only using the local vector norm provided by neighboring sensor nodes. We introduce an adaptive mixing factor between instantaneous and recursive estimates of these norms for adaptivity in a time-varying system. Simulation results show that the extension provides estimation results close to the optimal fully-connected-network or broadcasting case while reducing inter-node transmission significantly.
Abstract:In this short paper, we describe an efficient numerical solver for the optimal sampling problem considered in "Designing Sampling Schemes for Multi-Dimensional Data". An implementation may be found on https://www.maths.lu.se/staff/andreas-jakobsson/publications/.
Abstract:In this work, we consider the problem of bounding the values of a covariance function corresponding to a continuous-time stationary stochastic process or signal. Specifically, for two signals whose covariance functions agree on a finite discrete set of time-lags, we consider the maximal possible discrepancy of the covariance functions for real-valued time-lags outside this discrete grid. Computing this uncertainty corresponds to solving an infinite dimensional non-convex problem. However, we herein prove that the maximal objective value may be bounded from above by a finite dimensional convex optimization problem, allowing for efficient computation by standard methods. Furthermore, we empirically observe that for the case of signals whose spectra are supported on an interval, this upper bound is sharp, i.e., provides an exact quantification of the covariance uncertainty.