Picture for Hainan Xu

Hainan Xu

Word Level Timestamp Generation for Automatic Speech Recognition and Translation

Add code
May 21, 2025
Viaarxiv icon

WIND: Accelerated RNN-T Decoding with Windowed Inference for Non-blank Detection

Add code
May 19, 2025
Viaarxiv icon

Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR

Add code
Oct 03, 2024
Viaarxiv icon

Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation

Add code
Sep 09, 2024
Figure 1 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Figure 2 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Figure 3 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Figure 4 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Viaarxiv icon

Romanization Encoding For Multilingual ASR

Add code
Jul 05, 2024
Figure 1 for Romanization Encoding For Multilingual ASR
Figure 2 for Romanization Encoding For Multilingual ASR
Figure 3 for Romanization Encoding For Multilingual ASR
Figure 4 for Romanization Encoding For Multilingual ASR
Viaarxiv icon

Label-Looping: Highly Efficient Decoding for Transducers

Add code
Jun 10, 2024
Figure 1 for Label-Looping: Highly Efficient Decoding for Transducers
Figure 2 for Label-Looping: Highly Efficient Decoding for Transducers
Figure 3 for Label-Looping: Highly Efficient Decoding for Transducers
Figure 4 for Label-Looping: Highly Efficient Decoding for Transducers
Viaarxiv icon

Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU

Add code
Jun 06, 2024
Figure 1 for Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU
Figure 2 for Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU
Figure 3 for Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU
Figure 4 for Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU
Viaarxiv icon

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

Add code
Apr 04, 2024
Figure 1 for Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Figure 2 for Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Figure 3 for Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Figure 4 for Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Viaarxiv icon

TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

Add code
Mar 20, 2024
Figure 1 for TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer
Figure 2 for TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer
Figure 3 for TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer
Figure 4 for TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer
Viaarxiv icon

Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

Add code
Sep 26, 2023
Figure 1 for Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition
Figure 2 for Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition
Figure 3 for Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition
Figure 4 for Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition
Viaarxiv icon