Picture for Xiaodan Zhuang

Xiaodan Zhuang

Segmental Attention Decoding With Long Form Acoustic Encodings

Add code
Dec 16, 2025
Figure 1 for Segmental Attention Decoding With Long Form Acoustic Encodings
Figure 2 for Segmental Attention Decoding With Long Form Acoustic Encodings
Figure 3 for Segmental Attention Decoding With Long Form Acoustic Encodings
Viaarxiv icon

Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition

Add code
Jan 16, 2025
Figure 1 for Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
Figure 2 for Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
Figure 3 for Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
Figure 4 for Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
Viaarxiv icon

Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval

Add code
Nov 04, 2024
Viaarxiv icon

Focused Discriminative Training For Streaming CTC-Trained Automatic Speech Recognition Models

Add code
Aug 23, 2024
Viaarxiv icon

Optimizing Byte-level Representation for End-to-end ASR

Add code
Jun 14, 2024
Viaarxiv icon

Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

Add code
Apr 18, 2023
Figure 1 for Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition
Figure 2 for Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition
Figure 3 for Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition
Viaarxiv icon

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

Add code
Nov 02, 2022
Figure 1 for Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
Figure 2 for Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
Figure 3 for Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
Figure 4 for Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
Viaarxiv icon

Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching

Add code
Aug 27, 2021
Figure 1 for Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching
Figure 2 for Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching
Figure 3 for Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching
Viaarxiv icon

Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems

Add code
Dec 07, 2020
Figure 1 for Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems
Figure 2 for Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems
Figure 3 for Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems
Figure 4 for Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems
Viaarxiv icon

SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition

Add code
Oct 09, 2019
Figure 1 for SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition
Figure 2 for SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition
Figure 3 for SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition
Figure 4 for SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition
Viaarxiv icon