Picture for Thomas Drugman

Thomas Drugman

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Add code
Feb 15, 2024
Viaarxiv icon

A Comparative Analysis of Pretrained Language Models for Text-to-Speech

Add code
Sep 04, 2023
Viaarxiv icon

Controllable Emphasis with zero data for text-to-speech

Add code
Jul 13, 2023
Viaarxiv icon

eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer

Add code
Jun 20, 2023
Viaarxiv icon

Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need

Add code
Jul 02, 2022
Figure 1 for Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need
Figure 2 for Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need
Figure 3 for Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need
Figure 4 for Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need
Viaarxiv icon

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody

Add code
Jun 29, 2022
Figure 1 for Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody
Figure 2 for Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody
Figure 3 for Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody
Figure 4 for Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody
Viaarxiv icon

Expressive, Variable, and Controllable Duration Modelling in TTS

Add code
Jun 28, 2022
Figure 1 for Expressive, Variable, and Controllable Duration Modelling in TTS
Figure 2 for Expressive, Variable, and Controllable Duration Modelling in TTS
Figure 3 for Expressive, Variable, and Controllable Duration Modelling in TTS
Figure 4 for Expressive, Variable, and Controllable Duration Modelling in TTS
Viaarxiv icon

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

Add code
Jun 27, 2022
Figure 1 for CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer
Figure 2 for CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer
Viaarxiv icon

Distribution augmentation for low-resource expressive text-to-speech

Add code
Feb 19, 2022
Figure 1 for Distribution augmentation for low-resource expressive text-to-speech
Figure 2 for Distribution augmentation for low-resource expressive text-to-speech
Figure 3 for Distribution augmentation for low-resource expressive text-to-speech
Figure 4 for Distribution augmentation for low-resource expressive text-to-speech
Viaarxiv icon

Multi-Scale Spectrogram Modelling for Neural Text-to-Speech

Add code
Jun 29, 2021
Figure 1 for Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Figure 2 for Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Figure 3 for Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Figure 4 for Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Viaarxiv icon