Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Jun 16, 2022

Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura

Figure 1 for Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Figure 2 for Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Figure 3 for Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Figure 4 for Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Share this with someone who'll enjoy it:

Abstract:Target speech extraction is a technique to extract the target speaker's voice from mixture signals using a pre-recorded enrollment utterance that characterize the voice characteristics of the target speaker. One major difficulty of target speech extraction lies in handling variability in ``intra-speaker'' characteristics, i.e., characteristics mismatch between target speech and an enrollment utterance. While most conventional approaches focus on improving {\it average performance} given a set of enrollment utterances, here we propose to guarantee the {\it worst performance}, which we believe is of great practical importance. In this work, we propose an evaluation metric called worst-enrollment source-to-distortion ratio (SDR) to quantitatively measure the robustness towards enrollment variations. We also introduce a novel training scheme that aims at directly optimizing the worst-case performance by focusing on training with difficult enrollment cases where extraction does not perform well. In addition, we investigate the effectiveness of auxiliary speaker identification loss (SI-loss) as another way to improve robustness over enrollments. Experimental validation reveals the effectiveness of both worst-enrollment target training and SI-loss training to improve robustness against enrollment variations, by increasing speaker discriminability.

* 5 pages, 2 figures, 3 tables Submitted to Interspeech 2022

View paper on

Share this with someone who'll enjoy it:

Title:Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Paper and Code