Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mikołaj Bińkowski

End-to-End Adversarial Text-to-Speech

Jun 05, 2020

Jeff Donahue, Sander Dieleman, Mikołaj Bińkowski, Erich Elsen, Karen Simonyan

Figure 1 for End-to-End Adversarial Text-to-Speech

Figure 2 for End-to-End Adversarial Text-to-Speech

Figure 3 for End-to-End Adversarial Text-to-Speech

Figure 4 for End-to-End Adversarial Text-to-Speech

Abstract:Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of which is designed or learnt independently from the rest. In this work, we take on the challenging task of learning to synthesise speech from normalised text or phonemes in an end-to-end manner, resulting in models which operate directly on character or phoneme input sequences and produce raw speech audio outputs. Our proposed generator is feed-forward and thus efficient for both training and inference, using a differentiable monotonic interpolation scheme to predict the duration of each input token. It learns to produce high fidelity audio through a combination of adversarial feedback and prediction losses constraining the generated audio to roughly match the ground truth in terms of its total duration and mel-spectrogram. To allow the model to capture temporal variation in the generated audio, we employ soft dynamic time warping in the spectrogram-based prediction loss. The resulting model achieves a mean opinion score exceeding 4 on a 5 point scale, which is comparable to the state-of-the-art models relying on multi-stage training and additional supervision.

Via

Access Paper or Ask Questions

High Fidelity Speech Synthesis with Adversarial Networks

Sep 26, 2019

Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan

Figure 1 for High Fidelity Speech Synthesis with Adversarial Networks

Figure 2 for High Fidelity Speech Synthesis with Adversarial Networks

Figure 3 for High Fidelity Speech Synthesis with Adversarial Networks

Figure 4 for High Fidelity Speech Synthesis with Adversarial Networks

Abstract:Generative adversarial networks have seen rapid development in recent years and have led to remarkable improvements in generative modelling of images. However, their application in the audio domain has received limited attention, and autoregressive models, such as WaveNet, remain the state of the art in generative modelling of audio signals such as human speech. To address this paucity, we introduce GAN-TTS, a Generative Adversarial Network for Text-to-Speech. Our architecture is composed of a conditional feed-forward generator producing raw speech audio, and an ensemble of discriminators which operate on random windows of different sizes. The discriminators analyse the audio both in terms of general realism, as well as how well the audio corresponds to the utterance that should be pronounced. To measure the performance of GAN-TTS, we employ both subjective human evaluation (MOS - Mean Opinion Score), as well as novel quantitative metrics (Fr\'echet DeepSpeech Distance and Kernel DeepSpeech Distance), which we find to be well correlated with MOS. We show that GAN-TTS is capable of generating high-fidelity speech with naturalness comparable to the state-of-the-art models, and unlike autoregressive models, it is highly parallelisable thanks to an efficient feed-forward generator. Listen to GAN-TTS reading this abstract at https://storage.googleapis.com/deepmind-media/research/abstract.wav.

Via

Access Paper or Ask Questions

Batch weight for domain adaptation with mass shift

May 29, 2019

Mikołaj Bińkowski, R Devon Hjelm, Aaron Courville

Figure 1 for Batch weight for domain adaptation with mass shift

Figure 2 for Batch weight for domain adaptation with mass shift

Figure 3 for Batch weight for domain adaptation with mass shift

Figure 4 for Batch weight for domain adaptation with mass shift

Abstract:Unsupervised domain transfer is the task of transferring or translating samples from a source distribution to a different target distribution. Current solutions unsupervised domain transfer often operate on data on which the modes of the distribution are well-matched, for instance have the same frequencies of classes between source and target distributions. However, these models do not perform well when the modes are not well-matched, as would be the case when samples are drawn independently from two different, but related, domains. This mode imbalance is problematic as generative adversarial networks (GANs), a successful approach in this setting, are sensitive to mode frequency, which results in a mismatch of semantics between source samples and generated samples of the target distribution. We propose a principled method of re-weighting training samples to correct for such mass shift between the transferred distributions, which we call batch-weight. We also provide rigorous probabilistic setting for domain transfer and new simplified objective for training transfer networks, an alternative to complex, multi-component loss functions used in the current state-of-the art image-to-image translation models. The new objective stems from the discrimination of joint distributions and enforces cycle-consistency in an abstract, high-level, rather than pixel-wise, sense. Lastly, we experimentally show the effectiveness of the proposed methods in several image-to-image translation tasks.

Via

Access Paper or Ask Questions

On gradient regularizers for MMD GANs

Oct 27, 2018

Michael Arbel, Dougal J. Sutherland, Mikołaj Bińkowski, Arthur Gretton

Figure 1 for On gradient regularizers for MMD GANs

Figure 2 for On gradient regularizers for MMD GANs

Figure 3 for On gradient regularizers for MMD GANs

Figure 4 for On gradient regularizers for MMD GANs

Abstract:We propose a principled method for gradient-based regularization of the critic of GAN-like models trained by adversarially optimizing the kernel of a Maximum Mean Discrepancy (MMD). We show that controlling the gradient of the critic is vital to having a sensible loss function, and devise a method to enforce exact, analytical gradient constraints at no additional cost compared to existing approximate techniques based on additive regularizers. The new loss function is provably continuous, and experiments show that it stabilizes and accelerates training, giving image generation models that outperform state-of-the art methods on $160 \times 160$ CelebA and $64 \times 64$ unconditional ImageNet.

* Code available at https://github.com/MichaelArbel/Scaled-MMD-GAN . v2: NIPS camera-ready version

Via

Access Paper or Ask Questions

Autoregressive Convolutional Neural Networks for Asynchronous Time Series

Jun 12, 2018

Mikołaj Bińkowski, Gautier Marti, Philippe Donnat

Figure 1 for Autoregressive Convolutional Neural Networks for Asynchronous Time Series

Figure 2 for Autoregressive Convolutional Neural Networks for Asynchronous Time Series

Figure 3 for Autoregressive Convolutional Neural Networks for Asynchronous Time Series

Figure 4 for Autoregressive Convolutional Neural Networks for Asynchronous Time Series

Abstract:We propose Significance-Offset Convolutional Neural Network, a deep convolutional network architecture for regression of multivariate asynchronous time series. The model is inspired by standard autoregressive (AR) models and gating mechanisms used in recurrent neural networks. It involves an AR-like weighting system, where the final predictor is obtained as a weighted sum of adjusted regressors, while the weights are datadependent functions learnt through a convolutional network. The architecture was designed for applications on asynchronous time series and is evaluated on such datasets: a hedge fund proprietary dataset of over 2 million quotes for a credit derivative index, an artificially generated noisy autoregressive series and UCI household electricity consumption dataset. The proposed architecture achieves promising results as compared to convolutional and recurrent neural networks.

* Proceedings of The 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 2018, to appear

Via

Access Paper or Ask Questions

Demystifying MMD GANs

Mar 21, 2018

Mikołaj Bińkowski, Dougal J. Sutherland, Michael Arbel, Arthur Gretton

Abstract:We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs. As our main theoretical contribution, we clarify the situation with bias in GAN loss functions raised by recent work: we show that gradient estimators used in the optimization process for both MMD GANs and Wasserstein GANs are unbiased, but learning a discriminator based on samples leads to biased gradients for the generator parameters. We also discuss the issue of kernel choice for the MMD critic, and characterize the kernel corresponding to the energy distance used for the Cramer GAN critic. Being an integral probability metric, the MMD benefits from training strategies recently developed for Wasserstein GANs. In experiments, the MMD GAN is able to employ a smaller critic network than the Wasserstein GAN, resulting in a simpler and faster-training algorithm with matching performance. We also propose an improved measure of GAN convergence, the Kernel Inception Distance, and show how to use it to dynamically adapt learning rates during GAN training.

* Published at ICLR 2018: https://openreview.net/forum?id=r1lUOzWCW . v4: actually-final version: non-existence of unbiased estimators for IPMs and FID; clarity edits to the main proof

Via

Access Paper or Ask Questions