Picture for Boris Ginsburg

Boris Ginsburg

Anticipating Future with Large Language Model for Simultaneous Machine Translation

Add code
Oct 29, 2024
Viaarxiv icon

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning

Add code
Oct 23, 2024
Viaarxiv icon

Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR

Add code
Oct 03, 2024
Viaarxiv icon

nGPT: Normalized Transformer with Representation Learning on the Hypersphere

Add code
Oct 01, 2024
Viaarxiv icon

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

Add code
Sep 30, 2024
Viaarxiv icon

Chain-of-Thought Prompting for Speech Translation

Add code
Sep 17, 2024
Figure 1 for Chain-of-Thought Prompting for Speech Translation
Figure 2 for Chain-of-Thought Prompting for Speech Translation
Figure 3 for Chain-of-Thought Prompting for Speech Translation
Figure 4 for Chain-of-Thought Prompting for Speech Translation
Viaarxiv icon

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Add code
Sep 17, 2024
Figure 1 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 2 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 3 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 4 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Viaarxiv icon

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

Add code
Sep 10, 2024
Figure 1 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 2 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 3 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 4 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Viaarxiv icon

Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation

Add code
Sep 09, 2024
Figure 1 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Figure 2 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Figure 3 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Figure 4 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Viaarxiv icon

Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR

Add code
Sep 02, 2024
Viaarxiv icon