Picture for Krishna C. Puvvada

Krishna C. Puvvada

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning

Add code
Oct 23, 2024
Viaarxiv icon

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

Add code
Sep 10, 2024
Figure 1 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 2 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 3 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 4 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Viaarxiv icon

Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR

Add code
Sep 02, 2024
Viaarxiv icon

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks

Add code
Aug 23, 2024
Figure 1 for NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Figure 2 for NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Figure 3 for NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Figure 4 for NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Viaarxiv icon

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

Add code
Jun 28, 2024
Viaarxiv icon

BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5

Add code
Jun 28, 2024
Figure 1 for BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Figure 2 for BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Figure 3 for BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Figure 4 for BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Viaarxiv icon

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

Add code
Oct 18, 2023
Viaarxiv icon

SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

Add code
Oct 13, 2023
Figure 1 for SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Figure 2 for SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Figure 3 for SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Figure 4 for SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Viaarxiv icon

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

Add code
Sep 19, 2023
Viaarxiv icon

Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio

Add code
Aug 09, 2023
Viaarxiv icon