Picture for Egor Lakomkin

Egor Lakomkin

Jack

Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens

Add code
Oct 04, 2024
Viaarxiv icon

Efficient Streaming LLM for Speech Recognition

Add code
Oct 02, 2024
Viaarxiv icon

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech

Add code
Oct 02, 2024
Figure 1 for Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Figure 2 for Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Figure 3 for Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Figure 4 for Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Viaarxiv icon

M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses

Add code
Sep 17, 2024
Figure 1 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Figure 2 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Figure 3 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Figure 4 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

Add code
Nov 12, 2023
Figure 1 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 2 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 3 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 4 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Viaarxiv icon

End-to-End Speech Recognition Contextualization with Large Language Models

Add code
Sep 19, 2023
Figure 1 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 2 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 3 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 4 for End-to-End Speech Recognition Contextualization with Large Language Models
Viaarxiv icon

Prompting Large Language Models with Speech Recognition Abilities

Add code
Jul 21, 2023
Figure 1 for Prompting Large Language Models with Speech Recognition Abilities
Figure 2 for Prompting Large Language Models with Speech Recognition Abilities
Figure 3 for Prompting Large Language Models with Speech Recognition Abilities
Figure 4 for Prompting Large Language Models with Speech Recognition Abilities
Viaarxiv icon

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Add code
Apr 03, 2023
Viaarxiv icon

Egocentric Audio-Visual Noise Suppression

Add code
Nov 07, 2022
Viaarxiv icon