Picture for Qiujia Li

Qiujia Li

MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events

Add code
Sep 25, 2024
Figure 1 for MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Figure 2 for MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Figure 3 for MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Figure 4 for MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Viaarxiv icon

Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation

Add code
Feb 20, 2024
Figure 1 for Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation
Figure 2 for Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation
Figure 3 for Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation
Figure 4 for Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation
Viaarxiv icon

Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR

Add code
Jan 17, 2024
Viaarxiv icon

Massive End-to-end Models for Short Search Queries

Add code
Sep 22, 2023
Figure 1 for Massive End-to-end Models for Short Search Queries
Figure 2 for Massive End-to-end Models for Short Search Queries
Figure 3 for Massive End-to-end Models for Short Search Queries
Figure 4 for Massive End-to-end Models for Short Search Queries
Viaarxiv icon

Modular Domain Adaptation for Conformer-Based Streaming ASR

Add code
May 22, 2023
Viaarxiv icon

Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition

Add code
Mar 20, 2023
Viaarxiv icon

Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-trained Models

Add code
Oct 07, 2021
Figure 1 for Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-trained Models
Figure 2 for Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-trained Models
Figure 3 for Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-trained Models
Figure 4 for Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-trained Models
Viaarxiv icon

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

Add code
Oct 07, 2021
Figure 1 for Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
Figure 2 for Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
Figure 3 for Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
Figure 4 for Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
Viaarxiv icon

Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition

Add code
Jul 01, 2021
Figure 1 for Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition
Figure 2 for Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition
Figure 3 for Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition
Figure 4 for Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition
Viaarxiv icon

Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction

Add code
Apr 26, 2021
Figure 1 for Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction
Figure 2 for Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction
Figure 3 for Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction
Figure 4 for Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction
Viaarxiv icon