Picture for Ozlem Kalinli

Ozlem Kalinli

Sid

Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-Tuning

Add code
Dec 09, 2024
Viaarxiv icon

CJST: CTC Compressor based Joint Speech and Text Training for Decoder-Only ASR

Add code
Nov 12, 2024
Viaarxiv icon

Efficient Streaming LLM for Speech Recognition

Add code
Oct 02, 2024
Viaarxiv icon

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech

Add code
Oct 02, 2024
Figure 1 for Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Figure 2 for Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Figure 3 for Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Figure 4 for Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Viaarxiv icon

M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses

Add code
Sep 17, 2024
Figure 1 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Figure 2 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Figure 3 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Figure 4 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Viaarxiv icon

Faster Speech-LLaMA Inference with Multi-token Prediction

Add code
Sep 12, 2024
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

Token-Weighted RNN-T for Learning from Flawed Data

Add code
Jun 26, 2024
Viaarxiv icon

Effective internal language model training and fusion for factorized transducer model

Add code
Apr 02, 2024
Viaarxiv icon

Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

Add code
Nov 12, 2023
Figure 1 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 2 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 3 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 4 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Viaarxiv icon