Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lucas Maison

Towards Early Prediction of Self-Supervised Speech Model Performance

Jan 10, 2025

Ryan Whetten, Lucas Maison, Titouan Parcollet, Marco Dinarelli, Yannick Estève

Figure 1 for Towards Early Prediction of Self-Supervised Speech Model Performance

Figure 2 for Towards Early Prediction of Self-Supervised Speech Model Performance

Figure 3 for Towards Early Prediction of Self-Supervised Speech Model Performance

Abstract:In Self-Supervised Learning (SSL), pre-training and evaluation are resource intensive. In the speech domain, current indicators of the quality of SSL models during pre-training, such as the loss, do not correlate well with downstream performance. Consequently, it is often difficult to gauge the final downstream performance in a cost efficient manner during pre-training. In this work, we propose unsupervised efficient methods that give insights into the quality of the pre-training of SSL speech models, namely, measuring the cluster quality and rank of the embeddings of the SSL model. Results show that measures of cluster quality and rank correlate better with downstream performance than the pre-training loss with only one hour of unlabeled audio, reducing the need for GPU hours and labeled data in SSL model evaluation.

Via

Access Paper or Ask Questions

Compression of Recurrent Neural Networks using Matrix Factorization

Oct 19, 2023

Lucas Maison, Hélion du Mas des Bourboux, Thomas Courtat

Abstract:Compressing neural networks is a key step when deploying models for real-time or embedded applications. Factorizing the model's matrices using low-rank approximations is a promising method for achieving compression. While it is possible to set the rank before training, this approach is neither flexible nor optimal. In this work, we propose a post-training rank-selection method called Rank-Tuning that selects a different rank for each matrix. Used in combination with training adaptations, our method achieves high compression rates with no or little performance degradation. Our numerical experiments on signal processing tasks show that we can compress recurrent neural networks up to 14x with at most 1.4% relative performance reduction.

Via

Access Paper or Ask Questions

Some voices are too common: Building fair speech recognition systems using the Common Voice dataset

Jun 01, 2023

Lucas Maison, Yannick Estève

Abstract:Automatic speech recognition (ASR) systems become increasingly efficient thanks to new advances in neural network training like self-supervised learning. However, they are known to be unfair toward certain groups, for instance, people speaking with an accent. In this work, we use the French Common Voice dataset to quantify the biases of a pre-trained wav2vec~2.0 model toward several demographic groups. By fine-tuning the pre-trained model on a variety of fixed-size, carefully crafted training sets, we demonstrate the importance of speaker diversity. We also run an in-depth analysis of the Common Voice corpus and identify important shortcomings that should be taken into account by users of this dataset.

* 5 pages, 3 figures. Accepted to Interspeech 2023

Via

Access Paper or Ask Questions

Improving Accented Speech Recognition with Multi-Domain Training

Mar 14, 2023

Lucas Maison, Yannick Estève

Abstract:Thanks to the rise of self-supervised learning, automatic speech recognition (ASR) systems now achieve near-human performance on a wide variety of datasets. However, they still lack generalization capability and are not robust to domain shifts like accent variations. In this work, we use speech audio representing four different French accents to create fine-tuning datasets that improve the robustness of pre-trained ASR models. By incorporating various accents in the training set, we obtain both in-domain and out-of-domain improvements. Our numerical experiments show that we can reduce error rates by up to 25% (relative) on African and Belgian accents compared to single-domain training while keeping a good performance on standard French.

* 5 pages, 2 figures. Accepted to ICASSP 2023

Via

Access Paper or Ask Questions