Picture for Jinxi Guo

Jinxi Guo

Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech Recognition

Add code
Dec 21, 2024
Viaarxiv icon

Towards scalable efficient on-device ASR with transfer learning

Add code
Jul 23, 2024
Viaarxiv icon

Effective internal language model training and fusion for factorized transducer model

Add code
Apr 02, 2024
Figure 1 for Effective internal language model training and fusion for factorized transducer model
Figure 2 for Effective internal language model training and fusion for factorized transducer model
Figure 3 for Effective internal language model training and fusion for factorized transducer model
Viaarxiv icon

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Add code
Sep 22, 2023
Viaarxiv icon

Prompting Large Language Models with Speech Recognition Abilities

Add code
Jul 21, 2023
Figure 1 for Prompting Large Language Models with Speech Recognition Abilities
Figure 2 for Prompting Large Language Models with Speech Recognition Abilities
Figure 3 for Prompting Large Language Models with Speech Recognition Abilities
Figure 4 for Prompting Large Language Models with Speech Recognition Abilities
Viaarxiv icon

Improving Fast-slow Encoder based Transducer with Streaming Deliberation

Add code
Dec 15, 2022
Viaarxiv icon

Biased Self-supervised learning for ASR

Add code
Nov 04, 2022
Viaarxiv icon

VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition

Add code
Feb 22, 2022
Figure 1 for VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Figure 2 for VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Figure 3 for VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Figure 4 for VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Viaarxiv icon

REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling

Add code
Dec 14, 2020
Figure 1 for REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling
Figure 2 for REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling
Viaarxiv icon

Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification

Add code
Aug 08, 2020
Figure 1 for Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification
Figure 2 for Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification
Figure 3 for Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification
Viaarxiv icon