Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Adapting pretrained speech model for Mandarin lyrics transcription and alignment

Nov 21, 2023

Jun-You Wang, Chon-In Leong, Yu-Chen Lin, Li Su, Jyh-Shing Roger Jang

Figure 1 for Adapting pretrained speech model for Mandarin lyrics transcription and alignment

Figure 2 for Adapting pretrained speech model for Mandarin lyrics transcription and alignment

Figure 3 for Adapting pretrained speech model for Mandarin lyrics transcription and alignment

Figure 4 for Adapting pretrained speech model for Mandarin lyrics transcription and alignment

Share this with someone who'll enjoy it:

Abstract:The tasks of automatic lyrics transcription and lyrics alignment have witnessed significant performance improvements in the past few years. However, most of the previous works only focus on English in which large-scale datasets are available. In this paper, we address lyrics transcription and alignment of polyphonic Mandarin pop music in a low-resource setting. To deal with the data scarcity issue, we adapt pretrained Whisper model and fine-tune it on a monophonic Mandarin singing dataset. With the use of data augmentation and source separation model, results show that the proposed method achieves a character error rate of less than 18% on a Mandarin polyphonic dataset for lyrics transcription, and a mean absolute error of 0.071 seconds for lyrics alignment. Our results demonstrate the potential of adapting a pretrained speech model for lyrics transcription and alignment in low-resource scenarios.

* Accepted by ASRU 2023

View paper on

Share this with someone who'll enjoy it:

Title:Adapting pretrained speech model for Mandarin lyrics transcription and alignment

Paper and Code