Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods

Aug 23, 2023

Antoine Nzeyimana

Figure 1 for KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods

Figure 2 for KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods

Figure 3 for KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods

Figure 4 for KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods

Share this with someone who'll enjoy it:

Abstract:Despite recent availability of large transcribed Kinyarwanda speech data, achieving robust speech recognition for Kinyarwanda is still challenging. In this work, we show that using self-supervised pre-training, following a simple curriculum schedule during fine-tuning and using semi-supervised learning to leverage large unlabelled speech data significantly improve speech recognition performance for Kinyarwanda. Our approach focuses on using public domain data only. A new studio-quality speech dataset is collected from a public website, then used to train a clean baseline model. The clean baseline model is then used to rank examples from a more diverse and noisy public dataset, defining a simple curriculum training schedule. Finally, we apply semi-supervised learning to label and learn from large unlabelled data in four successive generations. Our final model achieves 3.2% word error rate (WER) on the new dataset and 15.9% WER on Mozilla Common Voice benchmark, which is state-of-the-art to the best of our knowledge. Our experiments also indicate that using syllabic rather than character-based tokenization results in better speech recognition performance for Kinyarwanda.

* 9 pages, 2 figures, 5 tables

View paper on

Share this with someone who'll enjoy it:

Title:KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods

Paper and Code