Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:EACELEB: An East Asian Language Speaking Celebrity Dataset for Speaker Recognition

Mar 10, 2022

Desmond Caulley, Yufeng Yang, David Anderson

Figure 1 for EACELEB: An East Asian Language Speaking Celebrity Dataset for Speaker Recognition

Figure 2 for EACELEB: An East Asian Language Speaking Celebrity Dataset for Speaker Recognition

Figure 3 for EACELEB: An East Asian Language Speaking Celebrity Dataset for Speaker Recognition

Figure 4 for EACELEB: An East Asian Language Speaking Celebrity Dataset for Speaker Recognition

Share this with someone who'll enjoy it:

Abstract:Large datasets are very useful for training speaker recognition systems, and various research groups have constructed several over the years. Voxceleb is a large dataset for speaker recognition that is extracted from Youtube videos. This paper presents an audio-visual method for acquiring audio data from Youtube given the speaker's name as input. The system follows a pipeline similar to that of the Voxceleb data acquisition method. However, our work focuses on fast data acquisition by using face-tracking in subsequent frames once a face has been detected -- this is preferable over face detection for every frame considering its computational cost. We show that applying audio diarization to our data after acquiring it can yield equal error rates comparable to Voxceleb. A secondary set of experiments showed that we could further decrease the error rate by fine-tuning a pre-trained x-vector system with the acquired data. Like Voxceleb, the work here focuses primarily on developing audio for celebrities. However, unlike Voxceleb, our target audio data is from celebrities in East Asian countries. Finally, we set up a speaker verification task to evaluate the accuracy of our acquired data. After diarization and fine-tuning, we achieved an equal error rate of approximately 4\% across our entire dataset.

View paper on

Share this with someone who'll enjoy it:

Title:EACELEB: An East Asian Language Speaking Celebrity Dataset for Speaker Recognition

Paper and Code