Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

Oct 12, 2021

Zhengyang Chen, Sanyuan Chen, Yu Wu, Yao Qian, Chengyi Wang, Shujie Liu, Yanmin Qian, Michael Zeng

Figure 1 for Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

Figure 2 for Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

Figure 3 for Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

Figure 4 for Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

Share this with someone who'll enjoy it:

Abstract:The speech representations learned from large-scale unlabeled data have shown better generalizability than those from supervised learning and thus attract a lot of interest to be applied for various downstream tasks. In this paper, we explore the limits of speech representations learned by different self-supervised objectives and datasets for automatic speaker verification (ASV), especially with a well-recognized SOTA ASV model, ECAPA-TDNN [1], as a downstream model. The representations from all hidden layers of the pre-trained model are firstly averaged with learnable weights and then fed into the ECAPA-TDNN as input features. The experimental results on Voxceleb dataset show that the weighted average representation is significantly superior to FBank, a conventional handcrafted feature for ASV. Our best single system achieves 0.564%, 0.561%, and 1.230% equal error rate (EER) on the three official trials of VoxCeleb1, separately. Accordingly, the ensemble system with three pre-trained models can further improve the EER to 0.431%, 0.507% and 1.081%. Among the three evaluation trials, our best system outperforms the winner system [2] of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC2021) on the VoxCeleb1-E trial.

* submitted to ICASSP 2022

View paper on

Share this with someone who'll enjoy it:

Title:Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

Paper and Code