Picture for Hasim Sak

Hasim Sak

Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition

Add code
Aug 05, 2024
Viaarxiv icon

Contrastive Siamese Network for Semi-supervised Speech Recognition

Add code
May 27, 2022
Figure 1 for Contrastive Siamese Network for Semi-supervised Speech Recognition
Figure 2 for Contrastive Siamese Network for Semi-supervised Speech Recognition
Figure 3 for Contrastive Siamese Network for Semi-supervised Speech Recognition
Figure 4 for Contrastive Siamese Network for Semi-supervised Speech Recognition
Viaarxiv icon

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

Add code
Oct 05, 2021
Figure 1 for Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection
Figure 2 for Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection
Figure 3 for Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection
Figure 4 for Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection
Viaarxiv icon

Reducing Streaming ASR Model Delay with Self Alignment

Add code
May 06, 2021
Figure 1 for Reducing Streaming ASR Model Delay with Self Alignment
Figure 2 for Reducing Streaming ASR Model Delay with Self Alignment
Figure 3 for Reducing Streaming ASR Model Delay with Self Alignment
Figure 4 for Reducing Streaming ASR Model Delay with Self Alignment
Viaarxiv icon

Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition

Add code
Oct 07, 2020
Figure 1 for Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition
Figure 2 for Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition
Figure 3 for Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition
Figure 4 for Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition
Viaarxiv icon

A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition

Add code
Feb 28, 2020
Figure 1 for A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition
Figure 2 for A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition
Figure 3 for A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition
Figure 4 for A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition
Viaarxiv icon

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

Add code
Feb 14, 2020
Figure 1 for Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
Figure 2 for Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
Figure 3 for Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
Figure 4 for Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
Viaarxiv icon

Adversarial Training for Multilingual Acoustic Modeling

Add code
Jun 17, 2019
Figure 1 for Adversarial Training for Multilingual Acoustic Modeling
Figure 2 for Adversarial Training for Multilingual Acoustic Modeling
Figure 3 for Adversarial Training for Multilingual Acoustic Modeling
Figure 4 for Adversarial Training for Multilingual Acoustic Modeling
Viaarxiv icon

Large-Scale Visual Speech Recognition

Add code
Oct 01, 2018
Figure 1 for Large-Scale Visual Speech Recognition
Figure 2 for Large-Scale Visual Speech Recognition
Figure 3 for Large-Scale Visual Speech Recognition
Figure 4 for Large-Scale Visual Speech Recognition
Viaarxiv icon

Speech recognition for medical conversations

Add code
Jun 20, 2018
Figure 1 for Speech recognition for medical conversations
Figure 2 for Speech recognition for medical conversations
Figure 3 for Speech recognition for medical conversations
Viaarxiv icon