Picture for Berrak Sisman

Berrak Sisman

DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech

Add code
Oct 17, 2024
Figure 1 for DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
Figure 2 for DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
Figure 3 for DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
Figure 4 for DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
Viaarxiv icon

Discrete Unit based Masking for Improving Disentanglement in Voice Conversion

Add code
Sep 17, 2024
Figure 1 for Discrete Unit based Masking for Improving Disentanglement in Voice Conversion
Figure 2 for Discrete Unit based Masking for Improving Disentanglement in Voice Conversion
Figure 3 for Discrete Unit based Masking for Improving Disentanglement in Voice Conversion
Figure 4 for Discrete Unit based Masking for Improving Disentanglement in Voice Conversion
Viaarxiv icon

SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection

Add code
Aug 30, 2024
Figure 1 for SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Figure 2 for SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Figure 3 for SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Figure 4 for SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Viaarxiv icon

PRESENT: Zero-Shot Text-to-Prosody Control

Add code
Aug 13, 2024
Viaarxiv icon

We Need Variations in Speech Synthesis: Sub-center Modelling for Speaker Embeddings

Add code
Jul 05, 2024
Figure 1 for We Need Variations in Speech Synthesis: Sub-center Modelling for Speaker Embeddings
Figure 2 for We Need Variations in Speech Synthesis: Sub-center Modelling for Speaker Embeddings
Figure 3 for We Need Variations in Speech Synthesis: Sub-center Modelling for Speaker Embeddings
Viaarxiv icon

Towards Naturalistic Voice Conversion: NaturalVoices Dataset with an Automatic Processing Pipeline

Add code
Jun 06, 2024
Viaarxiv icon

Style Mixture of Experts for Expressive Text-To-Speech Synthesis

Add code
Jun 05, 2024
Viaarxiv icon

Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training

Add code
Jun 03, 2024
Viaarxiv icon

Exploring speech style spaces with language models: Emotional TTS without emotion labels

Add code
May 18, 2024
Figure 1 for Exploring speech style spaces with language models: Emotional TTS without emotion labels
Figure 2 for Exploring speech style spaces with language models: Emotional TTS without emotion labels
Figure 3 for Exploring speech style spaces with language models: Emotional TTS without emotion labels
Figure 4 for Exploring speech style spaces with language models: Emotional TTS without emotion labels
Viaarxiv icon

Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model

Add code
May 02, 2024
Figure 1 for Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
Figure 2 for Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
Figure 3 for Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
Figure 4 for Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
Viaarxiv icon