Picture for Christian Fuegen

Christian Fuegen

Transcribing and Translating, Fast and Slow: Joint Speech Translation and Recognition

Add code
Dec 19, 2024
Viaarxiv icon

Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens

Add code
Oct 04, 2024
Figure 1 for Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens
Figure 2 for Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens
Figure 3 for Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens
Figure 4 for Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens
Viaarxiv icon

Effective internal language model training and fusion for factorized transducer model

Add code
Apr 02, 2024
Figure 1 for Effective internal language model training and fusion for factorized transducer model
Figure 2 for Effective internal language model training and fusion for factorized transducer model
Figure 3 for Effective internal language model training and fusion for factorized transducer model
Viaarxiv icon

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

Add code
Jan 18, 2024
Figure 1 for AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Figure 2 for AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Figure 3 for AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Figure 4 for AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Viaarxiv icon

Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

Add code
Nov 12, 2023
Figure 1 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 2 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 3 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 4 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Viaarxiv icon

End-to-End Speech Recognition Contextualization with Large Language Models

Add code
Sep 19, 2023
Figure 1 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 2 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 3 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 4 for End-to-End Speech Recognition Contextualization with Large Language Models
Viaarxiv icon

Prompting Large Language Models with Speech Recognition Abilities

Add code
Jul 21, 2023
Figure 1 for Prompting Large Language Models with Speech Recognition Abilities
Figure 2 for Prompting Large Language Models with Speech Recognition Abilities
Figure 3 for Prompting Large Language Models with Speech Recognition Abilities
Figure 4 for Prompting Large Language Models with Speech Recognition Abilities
Viaarxiv icon

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Add code
Apr 03, 2023
Viaarxiv icon

Streaming Audio-Visual Speech Recognition with Alignment Regularization

Add code
Nov 03, 2022
Viaarxiv icon

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

Add code
Apr 19, 2022
Figure 1 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 2 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 3 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 4 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Viaarxiv icon