Picture for Jay Mahadeokar

Jay Mahadeokar

Jack

CJST: CTC Compressor based Joint Speech and Text Training for Decoder-Only ASR

Add code
Nov 12, 2024
Viaarxiv icon

Efficient Streaming LLM for Speech Recognition

Add code
Oct 02, 2024
Viaarxiv icon

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech

Add code
Oct 02, 2024
Viaarxiv icon

M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses

Add code
Sep 17, 2024
Figure 1 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Figure 2 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Figure 3 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Figure 4 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Viaarxiv icon

Faster Speech-LLaMA Inference with Multi-token Prediction

Add code
Sep 12, 2024
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

Towards scalable efficient on-device ASR with transfer learning

Add code
Jul 23, 2024
Viaarxiv icon

Effective internal language model training and fusion for factorized transducer model

Add code
Apr 02, 2024
Viaarxiv icon

Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

Add code
Nov 12, 2023
Figure 1 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 2 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 3 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 4 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Viaarxiv icon

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Add code
Sep 22, 2023
Viaarxiv icon