Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Jul 03, 2024

Kunal Dhawan, Nithin Rao Koluguri, Ante Jukić, Ryan Langman, Jagadeesh Balam, Boris Ginsburg

Figure 1 for Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Figure 2 for Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Figure 3 for Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Figure 4 for Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Share this with someone who'll enjoy it:

Abstract:Discrete speech representations have garnered recent attention for their efficacy in training transformer-based models for various speech-related tasks such as automatic speech recognition (ASR), translation, speaker verification, and joint speech-text foundational models. In this work, we present a comprehensive analysis on building ASR systems with discrete codes. We investigate different methods for codec training such as quantization schemes and time-domain vs spectral feature encodings. We further explore ASR training techniques aimed at enhancing performance, training efficiency, and noise robustness. Drawing upon our findings, we introduce a codec ASR pipeline that outperforms Encodec at similar bit-rate. Remarkably, it also surpasses the state-of-the-art results achieved by strong self-supervised models on the 143 languages ML-SUPERB benchmark despite being smaller in size and pretrained on significantly less data.

* Accepted at Interspeech 2024

View paper on

Share this with someone who'll enjoy it:

Title:Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Paper and Code