Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Theodor Nguyen

Conditional Diffusion Model for Target Speaker Extraction

Oct 07, 2023

Theodor Nguyen, Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C Woodland

Figure 1 for Conditional Diffusion Model for Target Speaker Extraction

Figure 2 for Conditional Diffusion Model for Target Speaker Extraction

Figure 3 for Conditional Diffusion Model for Target Speaker Extraction

Figure 4 for Conditional Diffusion Model for Target Speaker Extraction

Abstract:We propose DiffSpEx, a generative target speaker extraction method based on score-based generative modelling through stochastic differential equations. DiffSpEx deploys a continuous-time stochastic diffusion process in the complex short-time Fourier transform domain, starting from the target speaker source and converging to a Gaussian distribution centred on the mixture of sources. For the reverse-time process, a parametrised score function is conditioned on a target speaker embedding to extract the target speaker from the mixture of sources. We utilise ECAPA-TDNN target speaker embeddings and condition the score function alternately on the SDE time embedding and the target speaker embedding. The potential of DiffSpEx is demonstrated with the WSJ0-2mix dataset, achieving an SI-SDR of 12.9 dB and a NISQA score of 3.56. Moreover, we show that fine-tuning a pre-trained DiffSpEx model to a specific speaker further improves performance, enabling personalisation in target speaker extraction.

* 5 pages, 4 figures, submitted to ICASSP 2024

Via

Access Paper or Ask Questions