Picture for Spyros Raptis

Spyros Raptis

Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling

Add code
Sep 13, 2024
Figure 1 for Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Figure 2 for Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Figure 3 for Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Figure 4 for Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Viaarxiv icon

Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification

Add code
Apr 02, 2024
Viaarxiv icon

Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis

Add code
Nov 02, 2022
Figure 1 for Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Figure 2 for Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Figure 3 for Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Figure 4 for Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Viaarxiv icon

Generating Gender-Ambiguous Text-to-Speech Voices

Add code
Nov 01, 2022
Figure 1 for Generating Gender-Ambiguous Text-to-Speech Voices
Figure 2 for Generating Gender-Ambiguous Text-to-Speech Voices
Figure 3 for Generating Gender-Ambiguous Text-to-Speech Voices
Figure 4 for Generating Gender-Ambiguous Text-to-Speech Voices
Viaarxiv icon

Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis

Add code
Nov 01, 2022
Figure 1 for Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Figure 2 for Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Figure 3 for Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Figure 4 for Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Viaarxiv icon

Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation

Add code
Oct 31, 2022
Viaarxiv icon

Fine-grained Noise Control for Multispeaker Speech Synthesis

Add code
Apr 11, 2022
Figure 1 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Figure 2 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Figure 3 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Figure 4 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Viaarxiv icon

Self supervised learning for robust voice cloning

Add code
Apr 07, 2022
Figure 1 for Self supervised learning for robust voice cloning
Figure 2 for Self supervised learning for robust voice cloning
Figure 3 for Self supervised learning for robust voice cloning
Viaarxiv icon

Word-Level Style Control for Expressive, Non-attentive Speech Synthesis

Add code
Nov 19, 2021
Figure 1 for Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Figure 2 for Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Figure 3 for Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Figure 4 for Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Viaarxiv icon

High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

Add code
Nov 17, 2021
Figure 1 for High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency
Figure 2 for High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency
Figure 3 for High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency
Viaarxiv icon