Picture for Ankur Bapna

Ankur Bapna

STAB: Speech Tokenizer Assessment Benchmark

Add code
Sep 04, 2024
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Multimodal Modeling For Spoken Language Identification

Add code
Sep 19, 2023
Figure 1 for Multimodal Modeling For Spoken Language Identification
Figure 2 for Multimodal Modeling For Spoken Language Identification
Figure 3 for Multimodal Modeling For Spoken Language Identification
Figure 4 for Multimodal Modeling For Spoken Language Identification
Viaarxiv icon

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Add code
Sep 09, 2023
Viaarxiv icon

AudioPaLM: A Large Language Model That Can Speak and Listen

Add code
Jun 22, 2023
Viaarxiv icon

Label Aware Speech Representation Learning For Language Identification

Add code
Jun 07, 2023
Figure 1 for Label Aware Speech Representation Learning For Language Identification
Figure 2 for Label Aware Speech Representation Learning For Language Identification
Figure 3 for Label Aware Speech Representation Learning For Language Identification
Figure 4 for Label Aware Speech Representation Learning For Language Identification
Viaarxiv icon

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

Add code
May 30, 2023
Figure 1 for LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Figure 2 for LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Figure 3 for LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Figure 4 for LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Viaarxiv icon

Understanding Shared Speech-Text Representations

Add code
Apr 27, 2023
Viaarxiv icon

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

Add code
Mar 03, 2023
Viaarxiv icon