Picture for Soumi Maiti

Soumi Maiti

ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech

Add code
Feb 13, 2025
Viaarxiv icon

Text-To-Speech Synthesis In The Wild

Add code
Sep 13, 2024
Viaarxiv icon

SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data

Add code
Aug 01, 2024
Viaarxiv icon

Towards Robust Speech Representation Learning for Thousands of Languages

Add code
Jul 02, 2024
Viaarxiv icon

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

Add code
Feb 25, 2024
Figure 1 for TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Figure 2 for TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Figure 3 for TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Figure 4 for TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Viaarxiv icon

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition

Add code
Jan 31, 2024
Viaarxiv icon

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

Add code
Jan 30, 2024
Viaarxiv icon

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Oct 02, 2023
Viaarxiv icon

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

Add code
Oct 01, 2023
Viaarxiv icon

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

Add code
Sep 28, 2023
Viaarxiv icon