Picture for Soumi Maiti

Soumi Maiti

Text-To-Speech Synthesis In The Wild

Add code
Sep 13, 2024
Viaarxiv icon

SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data

Add code
Aug 01, 2024
Viaarxiv icon

Towards Robust Speech Representation Learning for Thousands of Languages

Add code
Jul 02, 2024
Viaarxiv icon

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

Add code
Feb 25, 2024
Figure 1 for TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Figure 2 for TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Figure 3 for TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Figure 4 for TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Viaarxiv icon

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition

Add code
Jan 31, 2024
Viaarxiv icon

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

Add code
Jan 30, 2024
Viaarxiv icon

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Oct 02, 2023
Viaarxiv icon

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

Add code
Oct 01, 2023
Viaarxiv icon

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

Add code
Sep 28, 2023
Viaarxiv icon

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

Add code
Sep 27, 2023
Figure 1 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 2 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 3 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 4 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Viaarxiv icon