Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sebastian J. Schlecht

Joint Spectrogram Separation and TDOA Estimation using Optimal Transport

Mar 24, 2025

Linda Fabiani, Sebastian J. Schlecht, Isabel Haasler, Filip Elvander

Abstract:Separating sources is a common challenge in applications such as speech enhancement and telecommunications, where distinguishing between overlapping sounds helps reduce interference and improve signal quality. Additionally, in multichannel systems, correct calibration and synchronization are essential to separate and locate source signals accurately. This work introduces a method for blind source separation and estimation of the Time Difference of Arrival (TDOA) of signals in the time-frequency domain. Our proposed method effectively separates signal mixtures into their original source spectrograms while simultaneously estimating the relative delays between receivers, using Optimal Transport (OT) theory. By exploiting the structure of the OT problem, we combine the separation and delay estimation processes into a unified framework, optimizing the system through a block coordinate descent algorithm. We analyze the performance of the OT-based estimator under various noise conditions and compare it with conventional TDOA and source separation methods. Numerical simulation results demonstrate that our proposed approach can achieve a significant level of accuracy across diverse noise scenarios for physical speech signals in both TDOA and source separation tasks.

Via

Access Paper or Ask Questions

MoD-ART: Modal Decomposition of Acoustic Radiance Transfer

Dec 05, 2024

Matteo Scerbo, Sebastian J. Schlecht, Randall Ali, Lauri Savioja, Enzo De Sena

Abstract:Modeling late reverberation at interactive speeds is a challenging task when multiple sound sources and listeners are present in the same environment. This is especially problematic when the environment is geometrically complex and/or features uneven energy absorption (e.g. coupled volumes), because in such cases the late reverberation is dependent on the sound sources' and listeners' positions, and therefore must be adapted to their movements in real time. We present a novel approach to the task, named modal decomposition of Acoustic Radiance Transfer (MoD-ART), which can handle highly complex scenarios with efficiency. The approach is based on the geometrical acoustics method of Acoustic Radiance Transfer, from which we extract a set of energy decay modes and their positional relationships with sources and listeners. In this paper, we describe the physical and mathematical meaningfulness of MoD-ART, highlighting its advantages and applicability to different scenarios. Through an analysis of the method's computational complexity, we show that it compares very favourably with ray-tracing. We also present simulation results showing that MoD-ART can capture multiple decay slopes and flutter echoes.

Via

Access Paper or Ask Questions

FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing

Sep 13, 2024

Gloria Dal Santo, Gian Marco De Bortoli, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki

Figure 1 for FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing

Figure 2 for FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing

Figure 3 for FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing

Figure 4 for FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing

Abstract:We present FLAMO, a Frequency-sampling Library for Audio-Module Optimization designed to implement and optimize differentiable linear time-invariant audio systems. The library is open-source and built on the frequency-sampling filter design method, allowing for the creation of differentiable modules that can be used stand-alone or within the computation graph of neural networks, simplifying the development of differentiable audio systems. It includes predefined filtering modules and auxiliary classes for constructing, training, and logging the optimized systems, all accessible through an intuitive interface. Practical application of these modules is demonstrated through two case studies: the optimization of an artificial reverberator and an active acoustics system for improved response smoothness.

Via

Access Paper or Ask Questions

Similarity Metrics For Late Reverberation

Aug 27, 2024

Gloria Dal Santo, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki

Figure 1 for Similarity Metrics For Late Reverberation

Figure 2 for Similarity Metrics For Late Reverberation

Figure 3 for Similarity Metrics For Late Reverberation

Abstract:Automatic tuning of reverberation algorithms relies on the optimization of a cost function. While general audio similarity metrics are useful, they are not optimized for the specific statistical properties of reverberation in rooms. This paper presents two novel metrics for assessing the similarity of late reverberation in room impulse responses. These metrics are differentiable and can be utilized within a machine-learning framework. We compare the performance of these metrics to two popular audio metrics using a large dataset of room impulse responses encompassing various room configurations and microphone positions. The results indicate that the proposed functions based on averaged power and frequency-band energy decay outperform the baselines with the former exhibiting the most suitable profile towards the minimum. The proposed work holds promise as an improvement to the design and evaluation of reverberation similarity metrics.

Via

Access Paper or Ask Questions

Fade-in Reverberation in Multi-room Environments Using the Common-Slope Model

Jul 18, 2024

Kyung Yun Lee, Nils Meyer-Kahlen, Georg Götz, U. Peter Svensson, Sebastian J. Schlecht, Vesa Välimäki

Abstract:In multi-room environments, modelling the sound propagation is complex due to the coupling of rooms and diverse source-receiver positions. A common scenario is when the source and the receiver are in different rooms without a clear line of sight. For such source-receiver configurations, an initial increase in energy is observed, referred to as the "fade-in" of reverberation. Based on recent work of representing inhomogeneous and anisotropic reverberation with common decay times, this work proposes an extended parametric model that enables the modelling of the fade-in phenomenon. The method performs fitting on the envelopes, instead of energy decay functions, and allows negative amplitudes of decaying exponentials. We evaluate the method on simulated and measured multi-room environments, where we show that the proposed approach can now model the fade-ins that were unrealisable with the previous method.

* 2024 AES 5th International Conference on Audio for Virtual and Augmented Reality

Via

Access Paper or Ask Questions

Feedback Delay Network Optimization

Feb 17, 2024

Gloria Dal Santo, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki

Figure 1 for Feedback Delay Network Optimization

Figure 2 for Feedback Delay Network Optimization

Figure 3 for Feedback Delay Network Optimization

Figure 4 for Feedback Delay Network Optimization

Abstract:A common bane of artificial reverberation algorithms is spectral coloration, typically manifesting as metallic ringing, leading to a degradation in the perceived sound quality. This paper presents an optimization framework where a differentiable feedback delay network is used to learn a set of parameters to reduce coloration iteratively. The parameters under optimization include the feedback matrix, as well as the input and output gains. The optimization objective is twofold: to maximize spectral flatness through a spectral loss while maintaining temporal density by penalizing sparseness in the parameter values. A favorable narrower distribution of modal excitation is achieved while maintaining the desired impulse response density. In a subjective assessment, the new method proves effective in reducing perceptual coloration of late reverberation. The proposed method achieves computational savings compared to the baseline while preserving its performance. The effectiveness of this work is demonstrated through two application scenarios where natural-sounding synthetic impulse responses are obtained via the introduction of attenuation filters and an optimizable scattering feedback matrix.

Via

Access Paper or Ask Questions

Deep Room Impulse Response Completion

Feb 01, 2024

Jackie Lin, Georg Götz, Sebastian J. Schlecht

Abstract:Rendering immersive spatial audio in virtual reality (VR) and video games demands a fast and accurate generation of room impulse responses (RIRs) to recreate auditory environments plausibly. However, the conventional methods for simulating or measuring long RIRs are either computationally intensive or challenged by low signal-to-noise ratios. This study is propelled by the insight that direct sound and early reflections encapsulate sufficient information about room geometry and absorption characteristics. Building upon this premise, we propose a novel task termed "RIR completion," aimed at synthesizing the late reverberation given only the early portion (50 ms) of the response. To this end, we introduce DECOR, Deep Exponential Completion Of Room impulse responses, a deep neural network structured as an autoencoder designed to predict multi-exponential decay envelopes of filtered noise sequences. The interpretability of DECOR's output facilitates its integration with diverse rendering techniques. The proposed method is compared against an adapted state-of-the-art network, and comparable performance shows promising results supporting the feasibility of the RIR completion task. The RIR completion can be widely adapted to enhance RIR generation tasks where fast late reverberation approximation is required.

* The following article has been submitted to the EURASIP Journal on Audio, Speech, and Music Processing

Via

Access Paper or Ask Questions

Damping Density of an Absorptive Shoebox Room Derived from the Image-Source Method

Oct 11, 2023

Sebastian J. Schlecht, Karolina Prawda, Rudolf Rabenstein, Maximilian Schäfer

Abstract:The image-source method is widely applied to compute room impulse responses (RIRs) of shoebox rooms with arbitrary absorption. However, with increasing RIR lengths, the number of image sources grows rapidly, leading to slow computation. In this paper, we derive a closed-form expression for the damping density, which characterizes the overall multi-slope energy decay. The omnidirectional energy decay over time is directly derived from the damping density. The resulting energy decay model accurately matches the late reverberation simulated via the image-source method. The proposed model allows the fast stochastic synthesis of late reverberation by shaping noise with the energy envelope. Simulations of various wall damping coefficients demonstrate the model's accuracy. The proposed model consistently outperforms the energy decay prediction accuracy compared to a state-of-the-art approximation method. The paper elaborates on the proposed damping density's applicability to modeling multi-sloped sound energy decay, predicting reverberation time in non-diffuse sound fields, and fast frequency-dependent RIR synthesis.

Via

Access Paper or Ask Questions

Neural network for multi-exponential sound energy decay analysis

May 19, 2022

Georg Götz, Ricardo Falcón Pérez, Sebastian J. Schlecht, Ville Pulkki

Figure 1 for Neural network for multi-exponential sound energy decay analysis

Figure 2 for Neural network for multi-exponential sound energy decay analysis

Figure 3 for Neural network for multi-exponential sound energy decay analysis

Figure 4 for Neural network for multi-exponential sound energy decay analysis

Abstract:An established model for sound energy decay functions (EDFs) is the superposition of multiple exponentials and a noise term. This work proposes a neural-network-based approach for estimating the model parameters from EDFs. The network is trained on synthetic EDFs and evaluated on two large datasets of over 20000 EDF measurements conducted in various acoustic environments. The evaluation shows that the proposed neural network architecture robustly estimates the model parameters from large datasets of measured EDFs, while being lightweight and computationally efficient. An implementation of the proposed neural network is publicly available.

* The following article has been submitted to the Journal of the Acoustical Society of America (JASA). After it is published, it will be found at http://asa.scitation.org/journal/jas

Via

Access Paper or Ask Questions

Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers

Apr 21, 2022

Julian D. Parker, Sebastian J. Schlecht, Rudolf Rabenstein, Maximilian Schäfer

Figure 1 for Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers

Figure 2 for Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers

Figure 3 for Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers

Figure 4 for Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers

Abstract:Discrete-time modeling of acoustic, mechanical and electrical systems is a prominent topic in the musical signal processing literature. Such models are mostly derived by discretizing a mathematical model, given in terms of ordinary or partial differential equations, using established techniques. Recent work has applied the techniques of machine-learning to construct such models automatically from data for the case of systems which have lumped states described by scalar values, such as electrical circuits. In this work, we examine how similar techniques are able to construct models of systems which have spatially distributed rather than lumped states. We describe several novel recurrent neural network structures, and show how they can be thought of as an extension of modal techniques. As a proof of concept, we generate synthetic data for three physical systems and show that the proposed network structures can be trained with this data to reproduce the behavior of these systems.

* Submitted to DAFx2022

Via

Access Paper or Ask Questions