Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chaitanya Talnikar

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

Jan 02, 2021

Changhan Wang, Morgane Rivière, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux

Figure 1 for VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

Figure 2 for VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

Figure 3 for VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

Figure 4 for VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

Abstract:We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised representation learning as well as semi-supervised learning. VoxPopuli also contains 1.8K hours of transcribed speeches in 16 languages and their aligned oral interpretations into 5 other languages totaling 5.1K hours. We provide speech recognition baselines and validate the versatility of VoxPopuli unlabelled data in semi-supervised learning under challenging out-of-domain settings. We will release the corpus at https://github.com/facebookresearch/voxpopuli under an open license.

Via

Access Paper or Ask Questions

Joint Masked CPC and CTC Training for ASR

Oct 30, 2020

Chaitanya Talnikar, Tatiana Likhomanenko, Ronan Collobert, Gabriel Synnaeve

Figure 1 for Joint Masked CPC and CTC Training for ASR

Figure 2 for Joint Masked CPC and CTC Training for ASR

Figure 3 for Joint Masked CPC and CTC Training for ASR

Figure 4 for Joint Masked CPC and CTC Training for ASR

Abstract:Self-supervised learning (SSL) has shown promise in learning representations of audio that are useful for automatic speech recognition (ASR). But, training SSL models like wav2vec~2.0 requires a two-stage pipeline. In this paper we demonstrate a single-stage training of ASR models that can utilize both unlabeled and labeled data. During training, we alternately minimize two losses: an unsupervised masked Contrastive Predictive Coding (CPC) loss and the supervised audio-to-text alignment loss Connectionist Temporal Classification (CTC). We show that this joint training method directly optimizes performance for the downstream ASR task using unsupervised data while achieving similar word error rates to wav2vec~2.0 on the Librispeech 100-hour dataset. Finally, we postulate that solving the contrastive task is a regularization for the supervised CTC loss.

Via

Access Paper or Ask Questions

Linear Range in Gradient Descent

May 23, 2019

Angxiu Ni, Chaitanya Talnikar

Figure 1 for Linear Range in Gradient Descent

Figure 2 for Linear Range in Gradient Descent

Figure 3 for Linear Range in Gradient Descent

Figure 4 for Linear Range in Gradient Descent

Abstract:This paper defines linear range as the range of parameter perturbations which lead to approximately linear perturbations in the states of a network. We compute linear range from the difference between actual perturbations in states and the tangent solution. Linear range is a new criterion for estimating the effectivenss of gradients and thus having many possible applications. In particular, we propose that the optimal learning rate at the initial stages of training is such that parameter changes on all minibatches are within linear range. We demonstrate our algorithm on two shallow neural networks and a ResNet.

* 9 pages, 4 figures

Via

Access Paper or Ask Questions