Picture for Alexandra Vioni

Alexandra Vioni

Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling

Add code
Sep 13, 2024
Figure 1 for Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Figure 2 for Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Figure 3 for Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Figure 4 for Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Viaarxiv icon

Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification

Add code
Apr 02, 2024
Viaarxiv icon

Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations

Add code
Feb 02, 2024
Viaarxiv icon

Controllable speech synthesis by learning discrete phoneme-level prosodic representations

Add code
Nov 29, 2022
Viaarxiv icon

Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features

Add code
Nov 01, 2022
Figure 1 for Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features
Figure 2 for Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features
Figure 3 for Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features
Figure 4 for Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features
Viaarxiv icon

SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis

Add code
Apr 06, 2022
Figure 1 for SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Figure 2 for SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Figure 3 for SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Figure 4 for SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Viaarxiv icon

Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis

Add code
Nov 19, 2021
Figure 1 for Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Figure 2 for Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Figure 3 for Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Figure 4 for Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Viaarxiv icon

Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control

Add code
Nov 19, 2021
Figure 1 for Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Figure 2 for Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Figure 3 for Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Figure 4 for Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Viaarxiv icon

Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control

Add code
Nov 17, 2021
Figure 1 for Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control
Figure 2 for Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control
Figure 3 for Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control
Figure 4 for Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control
Viaarxiv icon