Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Fine-Grained and Interpretable Neural Speech Editing

Jul 07, 2024

Max Morrison, Cameron Churchwell, Nathan Pruyne, Bryan Pardo

Figure 1 for Fine-Grained and Interpretable Neural Speech Editing

Figure 2 for Fine-Grained and Interpretable Neural Speech Editing

Share this with someone who'll enjoy it:

Abstract:Fine-grained editing of speech attributes$\unicode{x2014}$such as prosody (i.e., the pitch, loudness, and phoneme durations), pronunciation, speaker identity, and formants$\unicode{x2014}$is useful for fine-tuning and fixing imperfections in human and AI-generated speech recordings for creation of podcasts, film dialogue, and video game dialogue. Existing speech synthesis systems use representations that entangle two or more of these attributes, prohibiting their use in fine-grained, disentangled editing. In this paper, we demonstrate the first disentangled and interpretable representation of speech with comparable subjective and objective vocoding reconstruction accuracy to Mel spectrograms. Our interpretable representation, combined with our proposed data augmentation method, enables training an existing neural vocoder to perform fast, accurate, and high-quality editing of pitch, duration, volume, timbral correlates of volume, pronunciation, speaker identity, and spectral balance.

* Interspeech 2024

View paper on

Share this with someone who'll enjoy it:

Title:Fine-Grained and Interpretable Neural Speech Editing

Paper and Code