Picture for George Saon

George Saon

A Non-autoregressive Model for Joint STT and TTS

Add code
Jan 15, 2025
Viaarxiv icon

Bilevel Joint Unsupervised and Supervised Training for Automatic Speech Recognition

Add code
Dec 11, 2024
Viaarxiv icon

Exploring the limits of decoder-only models trained on public speech recognition corpora

Add code
Jan 31, 2024
Viaarxiv icon

Soft Random Sampling: A Theoretical and Empirical Analysis

Add code
Nov 24, 2023
Figure 1 for Soft Random Sampling: A Theoretical and Empirical Analysis
Figure 2 for Soft Random Sampling: A Theoretical and Empirical Analysis
Figure 3 for Soft Random Sampling: A Theoretical and Empirical Analysis
Figure 4 for Soft Random Sampling: A Theoretical and Empirical Analysis
Viaarxiv icon

Semi-Autoregressive Streaming ASR With Label Context

Add code
Sep 19, 2023
Viaarxiv icon

Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Add code
Sep 07, 2023
Figure 1 for Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems
Figure 2 for Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems
Figure 3 for Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems
Figure 4 for Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems
Viaarxiv icon

Diagonal State Space Augmented Transformers for Speech Recognition

Add code
Feb 27, 2023
Viaarxiv icon

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States

Add code
Aug 03, 2022
Figure 1 for VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Figure 2 for VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Figure 3 for VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Figure 4 for VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Viaarxiv icon

Extending RNN-T-based speech recognition systems with emotion and language classification

Add code
Jul 28, 2022
Figure 1 for Extending RNN-T-based speech recognition systems with emotion and language classification
Figure 2 for Extending RNN-T-based speech recognition systems with emotion and language classification
Figure 3 for Extending RNN-T-based speech recognition systems with emotion and language classification
Figure 4 for Extending RNN-T-based speech recognition systems with emotion and language classification
Viaarxiv icon

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization

Add code
Jun 16, 2022
Figure 1 for Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Figure 2 for Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Figure 3 for Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Viaarxiv icon