Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards End-to-end Speaker Diarization in the Wild

Nov 02, 2022

Zexu Pan, Gordon Wichern, François G. Germain, Aswin Subramanian, Jonathan Le Roux

Figure 1 for Towards End-to-end Speaker Diarization in the Wild

Figure 2 for Towards End-to-end Speaker Diarization in the Wild

Figure 3 for Towards End-to-end Speaker Diarization in the Wild

Figure 4 for Towards End-to-end Speaker Diarization in the Wild

Share this with someone who'll enjoy it:

Abstract:Speaker diarization algorithms address the "who spoke when" problem in audio recordings. Algorithms trained end-to-end have proven superior to classical modular-cascaded systems in constrained scenarios with a small number of speakers. However, their performance for in-the-wild recordings containing more speakers with shorter utterance lengths remains to be investigated. In this paper, we address this gap, showing that an attractor-based end-to-end system can also perform remarkably well in the latter scenario when first pre-trained on a carefully-designed simulated dataset that matches the distribution of in-the-wild recordings. We also propose to use an attention mechanism to increase the network capacity in decoding more speaker attractors, and to jointly train the attractors on a speaker recognition task to improve the speaker attractor representation. Even though the model we propose is audio-only, we find it significantly outperforms both audio-only and audio-visual baselines on the AVA-AVD benchmark dataset, achieving state-of-the-art results with an absolute reduction in diarization error of 23.3%.

* 5 pages, 2 figures, 2 tables. Submitted to ICASSP 2023

View paper on

Share this with someone who'll enjoy it:

Title:Towards End-to-end Speaker Diarization in the Wild

Paper and Code