Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bethan Thomas

Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition

Feb 07, 2022

Bethan Thomas, Samuel Kessler, Salah Karout

Figure 1 for Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition

Figure 2 for Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition

Figure 3 for Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition

Figure 4 for Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition

Abstract:Self-supervised learning (SSL) is a powerful tool that allows learning of underlying representations from unlabeled data. Transformer based models such as wav2vec 2.0 and HuBERT are leading the field in the speech domain. Generally these models are fine-tuned on a small amount of labeled data for a downstream task such as Automatic Speech Recognition (ASR). This involves re-training the majority of the model for each task. Adapters are small lightweight modules which are commonly used in Natural Language Processing (NLP) to adapt pre-trained models to new tasks. In this paper we propose applying adapters to wav2vec 2.0 to reduce the number of parameters required for downstream ASR tasks, and increase scalability of the model to multiple tasks or languages. Using adapters we can perform ASR while training fewer than 10% of parameters per task compared to full fine-tuning with little degradation of performance. Ablations show that applying adapters into just the top few layers of the pre-trained network gives similar performance to full transfer, supporting the theory that higher pre-trained layers encode more phonemic information, and further optimizing efficiency.

* 5 Pages, 4 figures. Accepted to ICASSP 2022

Via

Access Paper or Ask Questions

Continual-wav2vec2: an Application of Continual Learning for Self-Supervised Automatic Speech Recognition

Jul 26, 2021

Samuel Kessler, Bethan Thomas, Salah Karout

Figure 1 for Continual-wav2vec2: an Application of Continual Learning for Self-Supervised Automatic Speech Recognition

Figure 2 for Continual-wav2vec2: an Application of Continual Learning for Self-Supervised Automatic Speech Recognition

Figure 3 for Continual-wav2vec2: an Application of Continual Learning for Self-Supervised Automatic Speech Recognition

Figure 4 for Continual-wav2vec2: an Application of Continual Learning for Self-Supervised Automatic Speech Recognition

Abstract:We present a method for continual learning of speech representations for multiple languages using self-supervised learning (SSL) and applying these for automatic speech recognition. There is an abundance of unannotated speech, so creating self-supervised representations from raw audio and finetuning on a small annotated datasets is a promising direction to build speech recognition systems. Wav2vec models perform SSL on raw audio in a pretraining phase and then finetune on a small fraction of annotated data. SSL models have produced state of the art results for ASR. However, these models are very expensive to pretrain with self-supervision. We tackle the problem of learning new language representations continually from audio without forgetting a previous language representation. We use ideas from continual learning to transfer knowledge from a previous task to speed up pretraining a new language task. Our continual-wav2vec2 model can decrease pretraining times by 32% when learning a new language task, and learn this new audio-language representation without forgetting previous language representation.

* 11 pages, 9 figures including references and appendix. Accepted at ICML 2021 Workshop: Self-Supervised Learning for Reasoning and Perception

Via

Access Paper or Ask Questions