Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andreas Savas Tolias

AvaTr: One-Shot Speaker Extraction with Transformers

May 03, 2021

Shell Xu Hu, Md Rifat Arefin, Viet-Nhat Nguyen, Alish Dipani, Xaq Pitkow, Andreas Savas Tolias

Figure 1 for AvaTr: One-Shot Speaker Extraction with Transformers

Figure 2 for AvaTr: One-Shot Speaker Extraction with Transformers

Figure 3 for AvaTr: One-Shot Speaker Extraction with Transformers

Figure 4 for AvaTr: One-Shot Speaker Extraction with Transformers

Abstract:To extract the voice of a target speaker when mixed with a variety of other sounds, such as white and ambient noises or the voices of interfering speakers, we extend the Transformer network to attend the most relevant information with respect to the target speaker given the characteristics of his or her voices as a form of contextual information. The idea has a natural interpretation in terms of the selective attention theory. Specifically, we propose two models to incorporate the voice characteristics in Transformer based on different insights of where the feature selection should take place. Both models yield excellent performance, on par or better than published state-of-the-art models on the speaker extraction task, including separating speech of novel speakers not seen during training.

* 6 pages, 4 main figures, 2 supplemental figures

Via

Access Paper or Ask Questions