Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Identify Speakers in Cocktail Parties with End-to-End Attention

May 22, 2020

Junzhe Zhu, Mark Hasegawa-Johnson, Leda Sari

Figure 1 for Identify Speakers in Cocktail Parties with End-to-End Attention

Figure 2 for Identify Speakers in Cocktail Parties with End-to-End Attention

Figure 3 for Identify Speakers in Cocktail Parties with End-to-End Attention

Figure 4 for Identify Speakers in Cocktail Parties with End-to-End Attention

Share this with someone who'll enjoy it:

Abstract:In scenarios where multiple speakers talk at the same time, it is important to be able to identify the talkers accurately. This paper presents an end-to-end system that integrates speech source extraction and speaker identification, and proposes a new way to jointly optimize these two parts by max-pooling the speaker predictions along the channel dimension. Residual attention permits us to learn spectrogram masks that are optimized for the purpose of speaker identification, while residual forward connections permit dilated convolution with a sufficiently large context window to guarantee correct streaming across syllable boundaries. End-to-end training results in a system that recognizes one speaker in a two-speaker broadcast speech mixture with 99.9% accuracy and both speakers with 93.9% accuracy, and that recognizes all speakers in three-speaker scenarios with 81.2% accuracy.

* Submitted to Interspeech 2020; Github Link: https://github.com/JunzheJosephZhu/Identifying-Speakers-in-Cocktail-Parties-with-E2E-Attention

View paper on

Share this with someone who'll enjoy it:

Title:Identify Speakers in Cocktail Parties with End-to-End Attention

Paper and Code