Picture for Hainan Xu

Hainan Xu

Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR

Add code
Oct 03, 2024
Viaarxiv icon

Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation

Add code
Sep 09, 2024
Figure 1 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Figure 2 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Figure 3 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Figure 4 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Viaarxiv icon

Romanization Encoding For Multilingual ASR

Add code
Jul 05, 2024
Figure 1 for Romanization Encoding For Multilingual ASR
Figure 2 for Romanization Encoding For Multilingual ASR
Figure 3 for Romanization Encoding For Multilingual ASR
Figure 4 for Romanization Encoding For Multilingual ASR
Viaarxiv icon

Label-Looping: Highly Efficient Decoding for Transducers

Add code
Jun 10, 2024
Figure 1 for Label-Looping: Highly Efficient Decoding for Transducers
Figure 2 for Label-Looping: Highly Efficient Decoding for Transducers
Figure 3 for Label-Looping: Highly Efficient Decoding for Transducers
Figure 4 for Label-Looping: Highly Efficient Decoding for Transducers
Viaarxiv icon

Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU

Add code
Jun 06, 2024
Figure 1 for Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU
Figure 2 for Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU
Figure 3 for Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU
Figure 4 for Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU
Viaarxiv icon

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

Add code
Apr 04, 2024
Figure 1 for Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Figure 2 for Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Figure 3 for Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Figure 4 for Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Viaarxiv icon

TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

Add code
Mar 20, 2024
Figure 1 for TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer
Figure 2 for TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer
Figure 3 for TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer
Figure 4 for TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer
Viaarxiv icon

Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

Add code
Sep 26, 2023
Figure 1 for Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition
Figure 2 for Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition
Figure 3 for Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition
Figure 4 for Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition
Viaarxiv icon

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

Add code
Jun 01, 2023
Viaarxiv icon

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

Add code
Apr 13, 2023
Viaarxiv icon