Picture for Ammar Abbas

Ammar Abbas

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Add code
Feb 15, 2024
Figure 1 for BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Figure 2 for BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Figure 3 for BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Figure 4 for BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Viaarxiv icon

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

Add code
Jul 31, 2023
Viaarxiv icon

Controllable Emphasis with zero data for text-to-speech

Add code
Jul 13, 2023
Viaarxiv icon

eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer

Add code
Jun 20, 2023
Viaarxiv icon

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody

Add code
Jun 29, 2022
Figure 1 for Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody
Figure 2 for Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody
Figure 3 for Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody
Figure 4 for Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody
Viaarxiv icon

Expressive, Variable, and Controllable Duration Modelling in TTS

Add code
Jun 28, 2022
Figure 1 for Expressive, Variable, and Controllable Duration Modelling in TTS
Figure 2 for Expressive, Variable, and Controllable Duration Modelling in TTS
Figure 3 for Expressive, Variable, and Controllable Duration Modelling in TTS
Figure 4 for Expressive, Variable, and Controllable Duration Modelling in TTS
Viaarxiv icon

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

Add code
Jun 27, 2022
Figure 1 for CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer
Figure 2 for CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer
Viaarxiv icon

Multi-Scale Spectrogram Modelling for Neural Text-to-Speech

Add code
Jun 29, 2021
Figure 1 for Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Figure 2 for Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Figure 3 for Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Figure 4 for Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Viaarxiv icon

A learned conditional prior for the VAE acoustic space of a TTS system

Add code
Jun 14, 2021
Figure 1 for A learned conditional prior for the VAE acoustic space of a TTS system
Figure 2 for A learned conditional prior for the VAE acoustic space of a TTS system
Figure 3 for A learned conditional prior for the VAE acoustic space of a TTS system
Figure 4 for A learned conditional prior for the VAE acoustic space of a TTS system
Viaarxiv icon

Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech

Add code
Nov 04, 2020
Figure 1 for Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech
Figure 2 for Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech
Figure 3 for Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech
Viaarxiv icon