Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Improving noisy student training for low-resource languages in End-to-End ASR using CycleGAN and inter-domain losses

Jul 26, 2024

Chia-Yu Li, Ngoc Thang Vu

Figure 1 for Improving noisy student training for low-resource languages in End-to-End ASR using CycleGAN and inter-domain losses

Figure 2 for Improving noisy student training for low-resource languages in End-to-End ASR using CycleGAN and inter-domain losses

Figure 3 for Improving noisy student training for low-resource languages in End-to-End ASR using CycleGAN and inter-domain losses

Figure 4 for Improving noisy student training for low-resource languages in End-to-End ASR using CycleGAN and inter-domain losses

Share this with someone who'll enjoy it:

Abstract:Training a semi-supervised end-to-end speech recognition system using noisy student training has significantly improved performance. However, this approach requires a substantial amount of paired speech-text and unlabeled speech, which is costly for low-resource languages. Therefore, this paper considers a more extreme case of semi-supervised end-to-end automatic speech recognition where there are limited paired speech-text, unlabeled speech (less than five hours), and abundant external text. Firstly, we observe improved performance by training the model using our previous work on semi-supervised learning "CycleGAN and inter-domain losses" solely with external text. Secondly, we enhance "CycleGAN and inter-domain losses" by incorporating automatic hyperparameter tuning, calling it "enhanced CycleGAN inter-domain losses." Thirdly, we integrate it into the noisy student training approach pipeline for low-resource scenarios. Our experimental results, conducted on six non-English languages from Voxforge and Common Voice, show a 20% word error rate reduction compared to the baseline teacher model and a 10% word error rate reduction compared to the baseline best student model, highlighting the significant improvements achieved through our proposed method.

* 10 pages (2 for references), 4 figures, published in SIGUL2024@LREC-COLING 2024

View paper on

Share this with someone who'll enjoy it:

Title:Improving noisy student training for low-resource languages in End-to-End ASR using CycleGAN and inter-domain losses

Paper and Code