Picture for Jaime Lorenzo-Trueba

Jaime Lorenzo-Trueba

Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations

Add code
Feb 05, 2024
Viaarxiv icon

Multilingual context-based pronunciation learning for Text-to-Speech

Add code
Jul 31, 2023
Viaarxiv icon

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

Add code
Jul 31, 2023
Viaarxiv icon

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

Add code
Jul 31, 2023
Viaarxiv icon

Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation

Add code
Jul 29, 2022
Figure 1 for Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Figure 2 for Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Figure 3 for Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Figure 4 for Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Viaarxiv icon

Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need

Add code
Jul 02, 2022
Figure 1 for Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need
Figure 2 for Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need
Figure 3 for Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need
Figure 4 for Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need
Viaarxiv icon

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

Add code
Feb 16, 2022
Figure 1 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Figure 2 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Figure 3 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Figure 4 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Viaarxiv icon

Cross-speaker style transfer for text-to-speech using data augmentation

Add code
Feb 10, 2022
Figure 1 for Cross-speaker style transfer for text-to-speech using data augmentation
Figure 2 for Cross-speaker style transfer for text-to-speech using data augmentation
Figure 3 for Cross-speaker style transfer for text-to-speech using data augmentation
Figure 4 for Cross-speaker style transfer for text-to-speech using data augmentation
Viaarxiv icon

Enhancing audio quality for expressive Neural Text-to-Speech

Add code
Aug 13, 2021
Figure 1 for Enhancing audio quality for expressive Neural Text-to-Speech
Figure 2 for Enhancing audio quality for expressive Neural Text-to-Speech
Figure 3 for Enhancing audio quality for expressive Neural Text-to-Speech
Figure 4 for Enhancing audio quality for expressive Neural Text-to-Speech
Viaarxiv icon

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments

Add code
Jun 16, 2021
Figure 1 for Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments
Figure 2 for Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments
Figure 3 for Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments
Figure 4 for Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments
Viaarxiv icon