Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Toon van Waterschoot

Tracking of Spatially Dynamic Room Impulse Responses Along Locally Linearized Trajectories

Jun 13, 2025

Kathleen MacWilliam, Thomas Dietzen, Toon van Waterschoot

Abstract:Measuring room impulse responses (RIRs) at multiple spatial points is a time-consuming task, while simulations require detailed knowledge of the room's acoustic environment. In prior work, we proposed a method for estimating the early part of RIRs along a linear trajectory in a time-varying acoustic scenario involving a static sound source and a microphone moving at constant velocity. This approach relies on measured RIRs at the start and end points of the trajectory and assumes that the time intervals occupied by the direct sound and individual reflections along the trajectory are non-overlapping. The method's applicability is therefore restricted to relatively small areas within a room, and its performance has yet to be validated with real-world data. In this paper, we propose a practical extension of the method to more realistic scenarios by segmenting longer trajectories into smaller linear intervals where the assumptions approximately hold. Applying the method piecewise along these segments extends its applicability to more complex room environments. We demonstrate its effectiveness using the trajectoRIR database, which includes moving microphone recordings and RIR measurements at discrete points along a controlled L-shaped trajectory in a real room.

* 8 pages, 6 figures. Accepted paper for conference: Forum Acousticum Euronoise 2025 (fa-euronoise2025)

Via

Access Paper or Ask Questions

Sound Field Reconstruction Using Physics-Informed Boundary Integral Networks

Jun 04, 2025

Stefano Damiano, Toon van Waterschoot

Abstract:Sound field reconstruction refers to the problem of estimating the acoustic pressure field over an arbitrary region of space, using only a limited set of measurements. Physics-informed neural networks have been adopted to solve the problem by incorporating in the training loss function the governing partial differential equation, either the Helmholtz or the wave equation. In this work, we introduce a boundary integral network for sound field reconstruction. Relying on the Kirchhoff-Helmholtz boundary integral equation to model the sound field in a given region of space, we employ a shallow neural network to retrieve the pressure distribution on the boundary of the considered domain, enabling to accurately retrieve the acoustic pressure inside of it. Assuming the positions of measurement microphones are known, we train the model by minimizing the mean squared error between the estimated and measured pressure at those locations. Experimental results indicate that the proposed model outperforms existing physics-informed data-driven techniques.

* Accepted for publication at EUSIPCO 2025

Via

Access Paper or Ask Questions

Deep, data-driven modeling of room acoustics: literature review and research perspectives

Apr 22, 2025

Toon van Waterschoot

Abstract:Our everyday auditory experience is shaped by the acoustics of the indoor environments in which we live. Room acoustics modeling is aimed at establishing mathematical representations of acoustic wave propagation in such environments. These representations are relevant to a variety of problems ranging from echo-aided auditory indoor navigation to restoring speech understanding in cocktail party scenarios. Many disciplines in science and engineering have recently witnessed a paradigm shift powered by deep learning (DL), and room acoustics research is no exception. The majority of deep, data-driven room acoustics models are inspired by DL-based speech and image processing, and hence lack the intrinsic space-time structure of acoustic wave propagation. More recently, DL-based models for room acoustics that include either geometric or wave-based information have delivered promising results, primarily for the problem of sound field reconstruction. In this review paper, we will provide an extensive and structured literature review on deep, data-driven modeling in room acoustics. Moreover, we position these models in a framework that allows for a conceptual comparison with traditional physical and data-driven models. Finally, we identify strengths and shortcomings of deep, data-driven room acoustics models and outline the main challenges for further research.

Via

Access Paper or Ask Questions

The trajectoRIR Database: Room Acoustic Recordings Along a Trajectory of Moving Microphones

Mar 29, 2025

Stefano Damiano, Kathleen MacWilliam, Valerio Lorenzoni, Thomas Dietzen, Toon van Waterschoot

Abstract:Data availability is essential to develop acoustic signal processing algorithms, especially when it comes to data-driven approaches that demand large and diverse training datasets. For this reason, an increasing number of databases have been published in recent years, including either room impulse responses (RIRs) or recordings of moving audio. In this paper we introduce the trajectoRIR database, an extensive, multi-array collection of both dynamic and stationary acoustic recordings along a controlled trajectory in a room. Specifically, the database features recordings using moving microphones and stationary RIRs spatially sampling the room acoustics along an L-shaped, 3.74-meter-long trajectory. This combination makes trajectoRIR unique and applicable in various tasks ranging from sound source localization and tracking to spatially dynamic sound field reconstruction and system identification. The recording room has a reverberation time of 0.5 seconds, and the three different microphone configurations employed include a dummy head, with additional reference microphones located next to the ears, 3 first-order Ambisonics microphones, two circular arrays of 16 and 4 channels, and a 12-channel linear array. The motion of the microphones was achieved using a robotic cart traversing a rail at three speeds: [0.2,0.4,0.8] m/s. Audio signals were reproduced using two stationary loudspeakers. The collected database features 8648 stationary RIRs, as well as perfect sweeps, speech, music, and stationary noise recorded during motion. MATLAB and Python scripts are included to access the recorded audio as well as to retrieve geometrical information.

* 15 pages, 7 figures

Via

Access Paper or Ask Questions

A Comparative Analysis of Generalised Echo and Interference Cancelling and Extended Multichannel Wiener Filtering for Combined Noise Reduction and Acoustic Echo Cancellation

Mar 05, 2025

Arnout Roebben, Toon van Waterschoot, Marc Moonen

Abstract:Two algorithms for combined acoustic echo cancellation (AEC) and noise reduction (NR) are analysed, namely the generalised echo and interference canceller (GEIC) and the extended multichannel Wiener filter (MWFext). Previously, these algorithms have been examined for linear echo paths, and assuming access to voice activity detectors (VADs) that separately detect desired speech and echo activity. However, algorithms implementing VADs may introduce detection errors. Therefore, in this paper, the previous analyses are extended by 1) modelling general nonlinear echo paths by means of the generalised Bussgang decomposition, and 2) modelling VAD error effects in each specific algorithm, thereby also allowing to model specific VAD assumptions. It is found and verified with simulations that, generally, the MWFext achieves a higher NR performance, while the GEIC achieves a more robust AEC performance.

* Accepted for publication in ICASSP 2025

Via

Access Paper or Ask Questions

A Zero-Shot Physics-Informed Dictionary Learning Approach for Sound Field Reconstruction

Dec 24, 2024

Stefano Damiano, Federico Miotello, Mirco Pezzoli, Alberto Bernardini, Fabio Antonacci, Augusto Sarti, Toon van Waterschoot

Figure 1 for A Zero-Shot Physics-Informed Dictionary Learning Approach for Sound Field Reconstruction

Figure 2 for A Zero-Shot Physics-Informed Dictionary Learning Approach for Sound Field Reconstruction

Figure 3 for A Zero-Shot Physics-Informed Dictionary Learning Approach for Sound Field Reconstruction

Abstract:Sound field reconstruction aims to estimate pressure fields in areas lacking direct measurements. Existing techniques often rely on strong assumptions or face challenges related to data availability or the explicit modeling of physical properties. To bridge these gaps, this study introduces a zero-shot, physics-informed dictionary learning approach to perform sound field reconstruction. Our method relies only on a few sparse measurements to learn a dictionary, without the need for additional training data. Moreover, by enforcing the Helmholtz equation during the optimization process, the proposed approach ensures that the reconstructed sound field is represented as a linear combination of a few physically meaningful atoms. Evaluations on real-world data show that our approach achieves comparable performance to state-of-the-art dictionary learning techniques, with the advantage of requiring only a few observations of the sound field and no training on a dataset.

* Accepted for publication at ICASSP 2025

Via

Access Paper or Ask Questions

Integrated Minimum Mean Squared Error Algorithms for Combined Acoustic Echo Cancellation and Noise Reduction

Dec 05, 2024

Arnout Roebben, Toon van Waterschoot, Jan Wouters, Marc Moonen

Figure 1 for Integrated Minimum Mean Squared Error Algorithms for Combined Acoustic Echo Cancellation and Noise Reduction

Figure 2 for Integrated Minimum Mean Squared Error Algorithms for Combined Acoustic Echo Cancellation and Noise Reduction

Figure 3 for Integrated Minimum Mean Squared Error Algorithms for Combined Acoustic Echo Cancellation and Noise Reduction

Figure 4 for Integrated Minimum Mean Squared Error Algorithms for Combined Acoustic Echo Cancellation and Noise Reduction

Abstract:In many speech recording applications, noise and acoustic echo corrupt the desired speech. Consequently, combined noise reduction (NR) and acoustic echo cancellation (AEC) is required. Generally, a cascade approach is followed, i.e., the AEC and NR are designed in isolation by selecting a separate signal model, formulating a separate cost function, and using a separate solution strategy. The AEC and NR are then cascaded one after the other, not accounting for their interaction. In this paper, however, an integrated approach is proposed to consider this interaction in a general multi-microphone/multi-loudspeaker setup. Therefore, a single signal model of either the microphone signal vector or the extended signal vector, obtained by stacking microphone and loudspeaker signals, is selected, a single mean squared error cost function is formulated, and a common solution strategy is used. Using this microphone signal model, a multi channel Wiener filter (MWF) is derived. Using the extended signal model, an extended MWF (MWFext) is derived, and several equivalent expressions are found, which nevertheless are interpretable as cascade algorithms. Specifically, the MWFext is shown to be equivalent to algorithms where the AEC precedes the NR (AEC NR), the NR precedes the AEC (NR-AEC), and the extended NR (NRext) precedes the AEC and post-filter (PF) (NRext-AECPF). Under rank-deficiency conditions the MWFext is non-unique, such that this equivalence amounts to the expressions being specific, not necessarily minimum-norm solutions for this MWFext. The practical performances nonetheless differ due to non-stationarities and imperfect correlation matrix estimation, resulting in the AEC-NR and NRext-AEC-PF attaining best overall performance.

Via

Access Paper or Ask Questions

State-Space Estimation of Spatially Dynamic Room Impulse Responses using a Room Acoustic Model-based Prior

Nov 13, 2024

Kathleen MacWilliam, Thomas Dietzen, Randall Ali, Toon van Waterschoot

Figure 1 for State-Space Estimation of Spatially Dynamic Room Impulse Responses using a Room Acoustic Model-based Prior

Figure 2 for State-Space Estimation of Spatially Dynamic Room Impulse Responses using a Room Acoustic Model-based Prior

Figure 3 for State-Space Estimation of Spatially Dynamic Room Impulse Responses using a Room Acoustic Model-based Prior

Figure 4 for State-Space Estimation of Spatially Dynamic Room Impulse Responses using a Room Acoustic Model-based Prior

Abstract:The estimation of room impulse responses (RIRs) between static loudspeaker and microphone locations can be done using a number of well-established measurement and inference procedures. While these procedures assume a time-invariant acoustic system, time variations need to be considered for the case of spatially dynamic scenarios where loudspeakers and microphones are subject to movement. If the RIR is modeled using image sources, then movement implies that the distance to each image source varies over time, making the estimation of the spatially dynamic RIR particularly challenging. In this paper, we propose a procedure to estimate the early part of the spatially dynamic RIR between a stationary source and a microphone moving on a linear trajectory at constant velocity. The procedure is built upon a state-space model, where the state to be estimated represents the early RIR, the observation corresponds to a microphone recording in a spatially dynamic scenario, and time-varying distances to the image sources are incorporated into the state transition matrix obtained from static RIRs at the start and end point of the trajectory. The performance of the proposed approach is evaluated against state-of-the-art RIR interpolation and state-space estimation methods using simulations, demonstrating the potential of the proposed state-space model.

* Frontiers in Signal Processing 4 (2024)
* 30 pages, 13 figures

Via

Access Paper or Ask Questions

Reference Microphone Selection for the Weighted Prediction Error Algorithm using the Normalized L-p Norm

Nov 05, 2024

Anselm Lohmann, Toon van Waterschoot, Joerg Bitzer, Simon Doclo

Abstract:Reverberation may severely degrade the quality of speech signals recorded using microphones in a room. For compact microphone arrays, the choice of the reference microphone for multi-microphone dereverberation typically does not have a large influence on the dereverberation performance. In contrast, when the microphones are spatially distributed, the choice of the reference microphone may significantly contribute to the dereverberation performance. In this paper, we propose to perform reference microphone selection for the weighted prediction error (WPE) dereverberation algorithm based on the normalized $\ell_p$-norm of the dereverberated output signal. Experimental results for different source positions in a reverberant laboratory show that the proposed method yields a better dereverberation performance than reference microphone selection based on the early-to-late reverberation ratio or signal power.

Via

Access Paper or Ask Questions

Frequency Tracking Features for Data-Efficient Deep Siren Identification

Sep 13, 2024

Stefano Damiano, Thomas Dietzen, Toon van Waterschoot

Figure 1 for Frequency Tracking Features for Data-Efficient Deep Siren Identification

Figure 2 for Frequency Tracking Features for Data-Efficient Deep Siren Identification

Figure 3 for Frequency Tracking Features for Data-Efficient Deep Siren Identification

Figure 4 for Frequency Tracking Features for Data-Efficient Deep Siren Identification

Abstract:The identification of siren sounds in urban soundscapes is a crucial safety aspect for smart vehicles and has been widely addressed by means of neural networks that ensure robustness to both the diversity of siren signals and the strong and unstructured background noise characterizing traffic. Convolutional neural networks analyzing spectrogram features of incoming signals achieve state-of-the-art performance when enough training data capturing the diversity of the target acoustic scenes is available. In practice, data is usually limited and algorithms should be robust to adapt to unseen acoustic conditions without requiring extensive datasets for re-training. In this work, given the harmonic nature of siren signals, characterized by a periodically evolving fundamental frequency, we propose a low-complexity feature extraction method based on frequency tracking using a single-parameter adaptive notch filter. The features are then used to design a small-scale convolutional network suitable for training with limited data. The evaluation results indicate that the proposed model consistently outperforms the traditional spectrogram-based model when limited training data is available, achieves better cross-domain generalization and has a smaller size.

* Accepted paper: Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2024)

Via

Access Paper or Ask Questions