Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jahurul Islam

Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining

Nov 14, 2023

Jian Zhu, Farhan Samir, Changbing Yang, Jahurul Islam

Figure 1 for Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining

Figure 2 for Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining

Figure 3 for Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining

Figure 4 for Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining

Abstract:In this paper, we introduce a massively multilingual speech corpora with fine-grained phonemic transcriptions, encompassing more than 115 languages from diverse language families. Based on this multilingual dataset, we propose CLAP-IPA, a multilingual phoneme-speech contrastive embedding model capable of open-vocabulary matching between speech signals and phonemically transcribed keywords or arbitrary phrases. The proposed model has been tested on two fieldwork speech corpora in 97 unseen languages, exhibiting strong generalizability across languages. Comparison with a text-based model shows that using phonemes as modeling units enables much better crosslinguistic generalization than orthographic texts.

* Preprint; Work in Progress

Via

Access Paper or Ask Questions