Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Single microphone speaker extraction using unified time-frequency Siamese-Unet

Mar 06, 2022

Aviad Eisenberg, Sharon Gannot, Shlomo E. Chazan

Figure 1 for Single microphone speaker extraction using unified time-frequency Siamese-Unet

Figure 2 for Single microphone speaker extraction using unified time-frequency Siamese-Unet

Figure 3 for Single microphone speaker extraction using unified time-frequency Siamese-Unet

Figure 4 for Single microphone speaker extraction using unified time-frequency Siamese-Unet

Share this with someone who'll enjoy it:

Abstract:In this paper we present a unified time-frequency method for speaker extraction in clean and noisy conditions. Given a mixed signal, along with a reference signal, the common approaches for extracting the desired speaker are either applied in the time-domain or in the frequency-domain. In our approach, we propose a Siamese-Unet architecture that uses both representations. The Siamese encoders are applied in the frequency-domain to infer the embedding of the noisy and reference spectra, respectively. The concatenated representations are then fed into the decoder to estimate the real and imaginary components of the desired speaker, which are then inverse-transformed to the time-domain. The model is trained with the Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) loss to exploit the time-domain information. The time-domain loss is also regularized with frequency-domain loss to preserve the speech patterns. Experimental results demonstrate that the unified approach is not only very easy to train, but also provides superior results as compared with state-of-the-art (SOTA) Blind Source Separation (BSS) methods, as well as commonly used speaker extraction approach.

View paper on

Share this with someone who'll enjoy it:

Title:Single microphone speaker extraction using unified time-frequency Siamese-Unet

Paper and Code