Picture for Christian Fuegen

Christian Fuegen

Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens

Add code
Oct 04, 2024
Viaarxiv icon

Effective internal language model training and fusion for factorized transducer model

Add code
Apr 02, 2024
Viaarxiv icon

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

Add code
Jan 18, 2024
Figure 1 for AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Figure 2 for AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Figure 3 for AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Figure 4 for AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Viaarxiv icon

Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

Add code
Nov 12, 2023
Figure 1 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 2 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 3 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Figure 4 for Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Viaarxiv icon

End-to-End Speech Recognition Contextualization with Large Language Models

Add code
Sep 19, 2023
Figure 1 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 2 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 3 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 4 for End-to-End Speech Recognition Contextualization with Large Language Models
Viaarxiv icon

Prompting Large Language Models with Speech Recognition Abilities

Add code
Jul 21, 2023
Figure 1 for Prompting Large Language Models with Speech Recognition Abilities
Figure 2 for Prompting Large Language Models with Speech Recognition Abilities
Figure 3 for Prompting Large Language Models with Speech Recognition Abilities
Figure 4 for Prompting Large Language Models with Speech Recognition Abilities
Viaarxiv icon

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Add code
Apr 03, 2023
Viaarxiv icon

Streaming Audio-Visual Speech Recognition with Alignment Regularization

Add code
Nov 03, 2022
Viaarxiv icon

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

Add code
Apr 19, 2022
Figure 1 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 2 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 3 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 4 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Viaarxiv icon

Scaling ASR Improves Zero and Few Shot Learning

Add code
Nov 29, 2021
Figure 1 for Scaling ASR Improves Zero and Few Shot Learning
Figure 2 for Scaling ASR Improves Zero and Few Shot Learning
Figure 3 for Scaling ASR Improves Zero and Few Shot Learning
Figure 4 for Scaling ASR Improves Zero and Few Shot Learning
Viaarxiv icon