Picture for Takaaki Saeki

Takaaki Saeki

YODAS: Youtube-Oriented Dataset for Audio and Speech

Add code
Jun 02, 2024
Viaarxiv icon

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data

Add code
Feb 29, 2024
Figure 1 for Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Figure 2 for Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Figure 3 for Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Figure 4 for Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Viaarxiv icon

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

Add code
Jan 30, 2024
Viaarxiv icon

Diversity-based core-set selection for text-to-speech with linguistic and acoustic features

Add code
Sep 15, 2023
Viaarxiv icon

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

Add code
Feb 27, 2023
Viaarxiv icon

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

Add code
Feb 05, 2023
Viaarxiv icon

SpeechLMScore: Evaluating speech generation using speech language model

Add code
Dec 08, 2022
Viaarxiv icon

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech

Add code
Oct 27, 2022
Viaarxiv icon

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection

Add code
Oct 26, 2022
Viaarxiv icon

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses

Add code
Oct 18, 2022
Figure 1 for Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses
Figure 2 for Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses
Figure 3 for Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses
Figure 4 for Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses
Viaarxiv icon