Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefano Damiano

Sound Field Reconstruction Using Physics-Informed Boundary Integral Networks

Jun 04, 2025

Stefano Damiano, Toon van Waterschoot

Abstract:Sound field reconstruction refers to the problem of estimating the acoustic pressure field over an arbitrary region of space, using only a limited set of measurements. Physics-informed neural networks have been adopted to solve the problem by incorporating in the training loss function the governing partial differential equation, either the Helmholtz or the wave equation. In this work, we introduce a boundary integral network for sound field reconstruction. Relying on the Kirchhoff-Helmholtz boundary integral equation to model the sound field in a given region of space, we employ a shallow neural network to retrieve the pressure distribution on the boundary of the considered domain, enabling to accurately retrieve the acoustic pressure inside of it. Assuming the positions of measurement microphones are known, we train the model by minimizing the mean squared error between the estimated and measured pressure at those locations. Experimental results indicate that the proposed model outperforms existing physics-informed data-driven techniques.

* Accepted for publication at EUSIPCO 2025

Via

Access Paper or Ask Questions

The trajectoRIR Database: Room Acoustic Recordings Along a Trajectory of Moving Microphones

Mar 29, 2025

Stefano Damiano, Kathleen MacWilliam, Valerio Lorenzoni, Thomas Dietzen, Toon van Waterschoot

Abstract:Data availability is essential to develop acoustic signal processing algorithms, especially when it comes to data-driven approaches that demand large and diverse training datasets. For this reason, an increasing number of databases have been published in recent years, including either room impulse responses (RIRs) or recordings of moving audio. In this paper we introduce the trajectoRIR database, an extensive, multi-array collection of both dynamic and stationary acoustic recordings along a controlled trajectory in a room. Specifically, the database features recordings using moving microphones and stationary RIRs spatially sampling the room acoustics along an L-shaped, 3.74-meter-long trajectory. This combination makes trajectoRIR unique and applicable in various tasks ranging from sound source localization and tracking to spatially dynamic sound field reconstruction and system identification. The recording room has a reverberation time of 0.5 seconds, and the three different microphone configurations employed include a dummy head, with additional reference microphones located next to the ears, 3 first-order Ambisonics microphones, two circular arrays of 16 and 4 channels, and a 12-channel linear array. The motion of the microphones was achieved using a robotic cart traversing a rail at three speeds: [0.2,0.4,0.8] m/s. Audio signals were reproduced using two stationary loudspeakers. The collected database features 8648 stationary RIRs, as well as perfect sweeps, speech, music, and stationary noise recorded during motion. MATLAB and Python scripts are included to access the recorded audio as well as to retrieve geometrical information.

* 15 pages, 7 figures

Via

Access Paper or Ask Questions

A Zero-Shot Physics-Informed Dictionary Learning Approach for Sound Field Reconstruction

Dec 24, 2024

Stefano Damiano, Federico Miotello, Mirco Pezzoli, Alberto Bernardini, Fabio Antonacci, Augusto Sarti, Toon van Waterschoot

Figure 1 for A Zero-Shot Physics-Informed Dictionary Learning Approach for Sound Field Reconstruction

Figure 2 for A Zero-Shot Physics-Informed Dictionary Learning Approach for Sound Field Reconstruction

Figure 3 for A Zero-Shot Physics-Informed Dictionary Learning Approach for Sound Field Reconstruction

Abstract:Sound field reconstruction aims to estimate pressure fields in areas lacking direct measurements. Existing techniques often rely on strong assumptions or face challenges related to data availability or the explicit modeling of physical properties. To bridge these gaps, this study introduces a zero-shot, physics-informed dictionary learning approach to perform sound field reconstruction. Our method relies only on a few sparse measurements to learn a dictionary, without the need for additional training data. Moreover, by enforcing the Helmholtz equation during the optimization process, the proposed approach ensures that the reconstructed sound field is represented as a linear combination of a few physically meaningful atoms. Evaluations on real-world data show that our approach achieves comparable performance to state-of-the-art dictionary learning techniques, with the advantage of requiring only a few observations of the sound field and no training on a dataset.

* Accepted for publication at ICASSP 2025

Via

Access Paper or Ask Questions

Frequency Tracking Features for Data-Efficient Deep Siren Identification

Sep 13, 2024

Stefano Damiano, Thomas Dietzen, Toon van Waterschoot

Figure 1 for Frequency Tracking Features for Data-Efficient Deep Siren Identification

Figure 2 for Frequency Tracking Features for Data-Efficient Deep Siren Identification

Figure 3 for Frequency Tracking Features for Data-Efficient Deep Siren Identification

Figure 4 for Frequency Tracking Features for Data-Efficient Deep Siren Identification

Abstract:The identification of siren sounds in urban soundscapes is a crucial safety aspect for smart vehicles and has been widely addressed by means of neural networks that ensure robustness to both the diversity of siren signals and the strong and unstructured background noise characterizing traffic. Convolutional neural networks analyzing spectrogram features of incoming signals achieve state-of-the-art performance when enough training data capturing the diversity of the target acoustic scenes is available. In practice, data is usually limited and algorithms should be robust to adapt to unseen acoustic conditions without requiring extensive datasets for re-training. In this work, given the harmonic nature of siren signals, characterized by a periodically evolving fundamental frequency, we propose a low-complexity feature extraction method based on frequency tracking using a single-parameter adaptive notch filter. The features are then used to design a small-scale convolutional network suitable for training with limited data. The evaluation results indicate that the proposed model consistently outperforms the traditional spectrogram-based model when limited training data is available, achieves better cross-domain generalization and has a smaller size.

* Accepted paper: Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2024)

Via

Access Paper or Ask Questions

Can Synthetic Data Boost the Training of Deep Acoustic Vehicle Counting Networks?

Jan 17, 2024

Stefano Damiano, Luca Bondi, Shabnam Ghaffarzadegan, Andre Guntoro, Toon van Waterschoot

Figure 1 for Can Synthetic Data Boost the Training of Deep Acoustic Vehicle Counting Networks?

Figure 2 for Can Synthetic Data Boost the Training of Deep Acoustic Vehicle Counting Networks?

Figure 3 for Can Synthetic Data Boost the Training of Deep Acoustic Vehicle Counting Networks?

Figure 4 for Can Synthetic Data Boost the Training of Deep Acoustic Vehicle Counting Networks?

Abstract:In the design of traffic monitoring solutions for optimizing the urban mobility infrastructure, acoustic vehicle counting models have received attention due to their cost effectiveness and energy efficiency. Although deep learning has proven effective for visual traffic monitoring, its use has not been thoroughly investigated in the audio domain, likely due to real-world data scarcity. In this work, we propose a novel approach to acoustic vehicle counting by developing: i) a traffic noise simulation framework to synthesize realistic vehicle pass-by events; ii) a strategy to mix synthetic and real data to train a deep-learning model for traffic counting. The proposed system is capable of simultaneously counting cars and commercial vehicles driving on a two-lane road, and identifying their direction of travel under moderate traffic density conditions. With only 24 hours of labeled real-world traffic noise, we are able to improve counting accuracy on real-world data from $63\%$ to $88\%$ for cars and from $86\%$ to $94\%$ for commercial vehicles.

* Accepted paper: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)

Via

Access Paper or Ask Questions

Real-Time Acoustic Perception for Automotive Applications

Jan 30, 2023

Jun Yin, Stefano Damiano, Marian Verhelst, Toon van Waterschoot, Andre Guntoro

Abstract:In recent years the automotive industry has been strongly promoting the development of smart cars, equipped with multi-modal sensors to gather information about the surroundings, in order to aid human drivers or make autonomous decisions. While the focus has mostly been on visual sensors, also acoustic events are crucial to detect situations that require a change in the driving behavior, such as a car honking, or the sirens of approaching emergency vehicles. In this paper, we summarize the results achieved so far in the Marie Sklodowska-Curie Actions (MSCA) European Industrial Doctorates (EID) project Intelligent Ultra Low-Power Signal Processing for Automotive (I-SPOT). On the algorithmic side, the I-SPOT Project aims to enable detecting, localizing and tracking environmental audio signals by jointly developing microphone array processing and deep learning techniques that specifically target automotive applications. Data generation software has been developed to cover the I-SPOT target scenarios and research challenges. This tool is currently being used to develop low-complexity deep learning techniques for emergency sound detection. On the hardware side, the goal impels workflows for hardware-algorithm co-design to ease the generation of architectures that are sufficiently flexible towards algorithmic evolutions without giving up on efficiency, as well as enable rapid feedback of hardware implications of algorithmic decision. This is pursued though a hierarchical workflow that breaks the hardware-algorithm design space into reasonable subsets, which has been tested for operator-level optimizations on state-of-the-art robust sound source localization for edge devices. Further, several open challenges towards an end-to-end system are clarified for the next stage of I-SPOT.

Via

Access Paper or Ask Questions