Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning Multi-Target TDOA Features for Sound Event Localization and Detection

Aug 30, 2024

Axel Berg, Johanna Engman, Jens Gulin, Karl Åström, Magnus Oskarsson

Figure 1 for Learning Multi-Target TDOA Features for Sound Event Localization and Detection

Figure 2 for Learning Multi-Target TDOA Features for Sound Event Localization and Detection

Figure 3 for Learning Multi-Target TDOA Features for Sound Event Localization and Detection

Figure 4 for Learning Multi-Target TDOA Features for Sound Event Localization and Detection

Share this with someone who'll enjoy it:

Abstract:Sound event localization and detection (SELD) systems using audio recordings from a microphone array rely on spatial cues for determining the location of sound events. As a consequence, the localization performance of such systems is to a large extent determined by the quality of the audio features that are used as inputs to the system. We propose a new feature, based on neural generalized cross-correlations with phase-transform (NGCC-PHAT), that learns audio representations suitable for localization. Using permutation invariant training for the time-difference of arrival (TDOA) estimation problem enables NGCC-PHAT to learn TDOA features for multiple overlapping sound events. These features can be used as a drop-in replacement for GCC-PHAT inputs to a SELD-network. We test our method on the STARSS23 dataset and demonstrate improved localization performance compared to using standard GCC-PHAT or SALSA-Lite input features.

* DCASE 2024

View paper on

Share this with someone who'll enjoy it:

Title:Learning Multi-Target TDOA Features for Sound Event Localization and Detection

Paper and Code