Picture for Michael Auli

Michael Auli

Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking

Add code
Sep 27, 2024
Figure 1 for Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Figure 2 for Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Figure 3 for Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Figure 4 for Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Viaarxiv icon

Scaling A Simple Approach to Zero-Shot Speech Recognition

Add code
Jul 25, 2024
Figure 1 for Scaling A Simple Approach to Zero-Shot Speech Recognition
Figure 2 for Scaling A Simple Approach to Zero-Shot Speech Recognition
Figure 3 for Scaling A Simple Approach to Zero-Shot Speech Recognition
Figure 4 for Scaling A Simple Approach to Zero-Shot Speech Recognition
Viaarxiv icon

Toward Joint Language Modeling for Speech Units and Text

Add code
Oct 12, 2023
Figure 1 for Toward Joint Language Modeling for Speech Units and Text
Figure 2 for Toward Joint Language Modeling for Speech Units and Text
Figure 3 for Toward Joint Language Modeling for Speech Units and Text
Figure 4 for Toward Joint Language Modeling for Speech Units and Text
Viaarxiv icon

Scaling Speech Technology to 1,000+ Languages

Add code
May 22, 2023
Viaarxiv icon

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Add code
May 17, 2023
Viaarxiv icon

AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations

Add code
Feb 10, 2023
Viaarxiv icon

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language

Add code
Dec 14, 2022
Figure 1 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Figure 2 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Figure 3 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Figure 4 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Viaarxiv icon

Simple and Effective Unsupervised Speech Translation

Add code
Oct 18, 2022
Figure 1 for Simple and Effective Unsupervised Speech Translation
Figure 2 for Simple and Effective Unsupervised Speech Translation
Figure 3 for Simple and Effective Unsupervised Speech Translation
Figure 4 for Simple and Effective Unsupervised Speech Translation
Viaarxiv icon

Masked Autoencoders that Listen

Add code
Jul 13, 2022
Figure 1 for Masked Autoencoders that Listen
Figure 2 for Masked Autoencoders that Listen
Figure 3 for Masked Autoencoders that Listen
Figure 4 for Masked Autoencoders that Listen
Viaarxiv icon

Wav2Vec-Aug: Improved self-supervised training with limited data

Add code
Jun 27, 2022
Figure 1 for Wav2Vec-Aug: Improved self-supervised training with limited data
Figure 2 for Wav2Vec-Aug: Improved self-supervised training with limited data
Figure 3 for Wav2Vec-Aug: Improved self-supervised training with limited data
Figure 4 for Wav2Vec-Aug: Improved self-supervised training with limited data
Viaarxiv icon