Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Apr 04, 2022

Zifeng Zhao, Dongchao Yang, Rongzhi Gu, Haoran Zhang, Yuexian Zou

Figure 1 for Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Figure 2 for Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Figure 3 for Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Figure 4 for Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Share this with someone who'll enjoy it:

Abstract:Recently, end-to-end speaker extraction has attracted increasing attention and shown promising results. However, its performance is often inferior to that of a blind source separation (BSS) counterpart with a similar network architecture, due to the auxiliary speaker encoder may sometimes generate ambiguous speaker embeddings. Such ambiguous guidance information may confuse the separation network and hence lead to wrong extraction results, which deteriorates the overall performance. We refer to this as the target confusion problem. In this paper, we conduct an analysis of such an issue and solve it in two stages. In the training phase, we propose to integrate metric learning methods to improve the distinguishability of embeddings produced by the speaker encoder. While for inference, a novel post-filtering strategy is designed to revise the wrong results. Specifically, we first identify these confusion samples by measuring the similarities between output estimates and enrollment utterances, after which the true target sources are recovered by a subtraction operation. Experiments show that performance improvement of more than 1dB SI-SDRi can be brought, which validates the effectiveness of our methods and emphasizes the impact of the target confusion problem.

* 5 pages, 1 table, 5 figures. Submitted to INTERSPEECH 2022

View paper on

Share this with someone who'll enjoy it:

Title:Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Paper and Code