Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

Mar 14, 2020

Zack Hodari, Catherine Lai, Simon King

Figure 1 for Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

Figure 2 for Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

Figure 3 for Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

Figure 4 for Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

Share this with someone who'll enjoy it:

Abstract:In English, prosody adds a broad range of information to segment sequences, from information structure (e.g. contrast) to stylistic variation (e.g. expression of emotion). However, when learning to control prosody in text-to-speech voices, it is not clear what exactly the control is modifying. Existing research on discrete representation learning for prosody has demonstrated high naturalness, but no analysis has been performed on what these representations capture, or if they can generate meaningfully-distinct variants of an utterance. We present a phrase-level variational autoencoder with a multi-modal prior, using the mode centres as "intonation codes". Our evaluation establishes which intonation codes are perceptually distinct, finding that the intonation codes from our multi-modal latent model were significantly more distinct than a baseline using k-means clustering. We carry out a follow-up qualitative study to determine what information the codes are carrying. Most commonly, listeners commented on the intonation codes having a statement or question style. However, many other affect-related styles were also reported, including: emotional, uncertain, surprised, sarcastic, passive aggressive, and upset.

* Published to the 10th ISCA International Conference on Speech Prosody (SP2020)

View paper on

Share this with someone who'll enjoy it:

Title:Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

Paper and Code