Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Oliver Adams

LACITO

Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens

Feb 03, 2024

Nay San, Georgios Paraskevopoulos, Aryaman Arora, Xiluo He, Prabhjot Kaur, Oliver Adams, Dan Jurafsky

Figure 1 for Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens

Figure 2 for Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens

Figure 3 for Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens

Figure 4 for Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens

Abstract:While massively multilingual speech models like wav2vec 2.0 XLSR-128 can be directly fine-tuned for automatic speech recognition (ASR), downstream performance can still be relatively poor on languages that are under-represented in the pre-training data. Continued pre-training on 70-200 hours of untranscribed speech in these languages can help -- but what about languages without that much recorded data? For such cases, we show that supplementing the target language with data from a similar, higher-resource 'donor' language can help. For example, continued pre-training on only 10 hours of low-resource Punjabi supplemented with 60 hours of donor Hindi is almost as good as continued pretraining on 70 hours of Punjabi. By contrast, sourcing data from less similar donors like Bengali does not improve ASR performance. To inform donor language selection, we propose a novel similarity metric based on the sequence distribution of induced acoustic units: the Acoustic Token Distribution Similarity (ATDS). Across a set of typologically different target languages (Punjabi, Galician, Iban, Setswana), we show that the ATDS between the target language and its candidate donors precisely predicts target language ASR performance.

* Accepted for SIGTYP2024

Via

Access Paper or Ask Questions

User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis

Dec 15, 2020

Oliver Adams, Benjamin Galliot, Guillaume Wisniewski, Nicholas Lambourne, Ben Foley, Rahasya Sanders-Dwyer, Janet Wiles, Alexis Michaud, Séverine Guillaume, Laurent Besacier(+4 more)

Figure 1 for User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis

Figure 2 for User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis

Figure 3 for User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis

Figure 4 for User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis

Abstract:This paper reports on progress integrating the speech recognition toolkit ESPnet into Elpis, a web front-end originally designed to provide access to the Kaldi automatic speech recognition toolkit. The goal of this work is to make end-to-end speech recognition models available to language workers via a user-friendly graphical interface. Encouraging results are reported on (i) development of an ESPnet recipe for use in Elpis, with preliminary results on data sets previously used for training acoustic models with the Persephone toolkit along with a new data set that had not previously been used in speech recognition, and (ii) incorporating ESPnet into Elpis along with UI enhancements and a CUDA-supported Dockerfile.

Via

Access Paper or Ask Questions

Induced Inflection-Set Keyword Search in Speech

Oct 27, 2019

Oliver Adams, Matthew Wiesner, Jan Trmal, Garrett Nicolai, David Yarowsky

Figure 1 for Induced Inflection-Set Keyword Search in Speech

Figure 2 for Induced Inflection-Set Keyword Search in Speech

Figure 3 for Induced Inflection-Set Keyword Search in Speech

Abstract:We investigate the problem of searching for a lexeme-set in speech by searching for its inflectional variants. Experimental results indicate how lexeme-set search performance changes with the number of hypothesized inflections, while ablation experiments highlight the relative importance of different components in the lexeme-set search pipeline. We provide a recipe and evaluation set for the community to use as an extrinsic measure of the performance of inflection generation approaches.

Via

Access Paper or Ask Questions

Massively Multilingual Adversarial Speech Recognition

Apr 03, 2019

Oliver Adams, Matthew Wiesner, Shinji Watanabe, David Yarowsky

Figure 1 for Massively Multilingual Adversarial Speech Recognition

Figure 2 for Massively Multilingual Adversarial Speech Recognition

Figure 3 for Massively Multilingual Adversarial Speech Recognition

Figure 4 for Massively Multilingual Adversarial Speech Recognition

Abstract:We report on adaptation of multilingual end-to-end speech recognition models trained on as many as 100 languages. Our findings shed light on the relative importance of similarity between the target and pretraining languages along the dimensions of phonetics, phonology, language family, geographical location, and orthography. In this context, experiments demonstrate the effectiveness of two additional pretraining objectives in encouraging language-independent encoder representations: a context-independent phoneme objective paired with a language-adversarial classification objective.

* Accepted at NAACL-HLT 2019

Via

Access Paper or Ask Questions