Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tolúlopé Ògúnrèmí

Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching

Nov 25, 2023

Tolúlopé Ògúnrèmí, Christopher D. Manning, Dan Jurafsky

Figure 1 for Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching

Figure 2 for Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching

Figure 3 for Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching

Figure 4 for Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching

Abstract:While many speakers of low-resource languages regularly code-switch between their languages and other regional languages or English, datasets of codeswitched speech are too small to train bespoke acoustic models from scratch or do language model rescoring. Here we propose finetuning self-supervised speech representations such as wav2vec 2.0 XLSR to recognize code-switched data. We find that finetuning self-supervised multilingual representations and augmenting them with n-gram language models trained from transcripts reduces absolute word error rates by up to 20% compared to baselines of hybrid models trained from scratch on code-switched data. Our findings suggest that in circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.

* 5 pages, 1 figure. Computational Approaches to Linguistic Code-Switching, CALCS 2023 (co-located with EMNLP 2023)

Via

Access Paper or Ask Questions

Automated speech tools for helping communities process restricted-access corpora for language revival efforts

Apr 24, 2022

Nay San, Martijn Bartelds, Tolúlopé Ògúnrèmí, Alison Mount, Ruben Thompson, Michael Higgins, Roy Barker, Jane Simpson, Dan Jurafsky

Figure 1 for Automated speech tools for helping communities process restricted-access corpora for language revival efforts

Figure 2 for Automated speech tools for helping communities process restricted-access corpora for language revival efforts

Figure 3 for Automated speech tools for helping communities process restricted-access corpora for language revival efforts

Figure 4 for Automated speech tools for helping communities process restricted-access corpora for language revival efforts

Abstract:Many archival recordings of speech from endangered languages remain unannotated and inaccessible to community members and language learning programs. One bottleneck is the time-intensive nature of annotation. An even narrower bottleneck occurs for recordings with access constraints, such as language that must be vetted or filtered by authorised community members before annotation can begin. We propose a privacy-preserving workflow to widen both bottlenecks for recordings where speech in the endangered language is intermixed with a more widely-used language such as English for meta-linguistic commentary and questions (e.g. What is the word for 'tree'?). We integrate voice activity detection (VAD), spoken language identification (SLI), and automatic speech recognition (ASR) to transcribe the metalinguistic content, which an authorised person can quickly scan to triage recordings that can be annotated by people with lower levels of access. We report work-in-progress processing 136 hours archival audio containing a mix of English and Muruwari. Our collaborative work with the Muruwari custodian of the archival materials show that this workflow reduces metalanguage transcription time by 20% even given only minimal amounts of annotated training data: 10 utterances per language for SLI and for ASR at most 39 minutes, and possibly as little as 39 seconds.

* Accepted at ComputEL-5

Via

Access Paper or Ask Questions