Picture for Brian Kingsbury

Brian Kingsbury

Exploring the limits of decoder-only models trained on public speech recognition corpora

Add code
Jan 31, 2024
Viaarxiv icon

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

Add code
Jan 13, 2024
Viaarxiv icon

Soft Random Sampling: A Theoretical and Empirical Analysis

Add code
Nov 24, 2023
Viaarxiv icon

Semi-Autoregressive Streaming ASR With Label Context

Add code
Sep 19, 2023
Viaarxiv icon

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

Add code
May 21, 2023
Viaarxiv icon

High-Dimensional Smoothed Entropy Estimation via Dimensionality Reduction

Add code
May 11, 2023
Viaarxiv icon

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Add code
Oct 07, 2022
Figure 1 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 2 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 3 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 4 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Viaarxiv icon

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States

Add code
Aug 03, 2022
Figure 1 for VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Figure 2 for VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Figure 3 for VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Figure 4 for VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Viaarxiv icon

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization

Add code
Jun 16, 2022
Figure 1 for Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Figure 2 for Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Figure 3 for Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Viaarxiv icon

Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems

Add code
Apr 11, 2022
Figure 1 for Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Figure 2 for Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Figure 3 for Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Viaarxiv icon