Picture for Jinxi Guo

Jinxi Guo

Towards scalable efficient on-device ASR with transfer learning

Add code
Jul 23, 2024
Viaarxiv icon

Effective internal language model training and fusion for factorized transducer model

Add code
Apr 02, 2024
Viaarxiv icon

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Add code
Sep 22, 2023
Viaarxiv icon

Prompting Large Language Models with Speech Recognition Abilities

Add code
Jul 21, 2023
Figure 1 for Prompting Large Language Models with Speech Recognition Abilities
Figure 2 for Prompting Large Language Models with Speech Recognition Abilities
Figure 3 for Prompting Large Language Models with Speech Recognition Abilities
Figure 4 for Prompting Large Language Models with Speech Recognition Abilities
Viaarxiv icon

Improving Fast-slow Encoder based Transducer with Streaming Deliberation

Add code
Dec 15, 2022
Viaarxiv icon

Biased Self-supervised learning for ASR

Add code
Nov 04, 2022
Viaarxiv icon

VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition

Add code
Feb 22, 2022
Figure 1 for VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Figure 2 for VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Figure 3 for VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Figure 4 for VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Viaarxiv icon

REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling

Add code
Dec 14, 2020
Figure 1 for REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling
Figure 2 for REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling
Viaarxiv icon

Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification

Add code
Aug 08, 2020
Figure 1 for Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification
Figure 2 for Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification
Figure 3 for Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification
Viaarxiv icon

Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

Add code
Jul 27, 2020
Figure 1 for Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition
Figure 2 for Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition
Figure 3 for Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition
Figure 4 for Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition
Viaarxiv icon