Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Unsupervised Cross-Domain Speech-to-Speech Conversion with Time-Frequency Consistency

May 19, 2020

Mohammad Asif Khan, Fabien Cardinaux, Stefan Uhlich, Marc Ferras, Asja Fischer

Figure 1 for Unsupervised Cross-Domain Speech-to-Speech Conversion with Time-Frequency Consistency

Figure 2 for Unsupervised Cross-Domain Speech-to-Speech Conversion with Time-Frequency Consistency

Figure 3 for Unsupervised Cross-Domain Speech-to-Speech Conversion with Time-Frequency Consistency

Figure 4 for Unsupervised Cross-Domain Speech-to-Speech Conversion with Time-Frequency Consistency

Share this with someone who'll enjoy it:

Abstract:In recent years generative adversarial network (GAN) based models have been successfully applied for unsupervised speech-to-speech conversion.The rich compact harmonic view of the magnitude spectrogram is considered a suitable choice for training these models with audio data. To reconstruct the speech signal first a magnitude spectrogram is generated by the neural network, which is then utilized by methods like the Griffin-Lim algorithm to reconstruct a phase spectrogram. This procedure bears the problem that the generated magnitude spectrogram may not be consistent, which is required for finding a phase such that the full spectrogram has a natural-sounding speech waveform. In this work, we approach this problem by proposing a condition encouraging spectrogram consistency during the adversarial training procedure. We demonstrate our approach on the task of translating the voice of a male speaker to that of a female speaker, and vice versa. Our experimental results on the Librispeech corpus show that the model trained with the TF consistency provides a perceptually better quality of speech-to-speech conversion.

View paper on

Share this with someone who'll enjoy it:

Title:Unsupervised Cross-Domain Speech-to-Speech Conversion with Time-Frequency Consistency

Paper and Code