Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:RAVE: A variational autoencoder for fast and high-quality neural audio synthesis

Nov 09, 2021

Antoine Caillon, Philippe Esling

Figure 1 for RAVE: A variational autoencoder for fast and high-quality neural audio synthesis

Figure 2 for RAVE: A variational autoencoder for fast and high-quality neural audio synthesis

Figure 3 for RAVE: A variational autoencoder for fast and high-quality neural audio synthesis

Figure 4 for RAVE: A variational autoencoder for fast and high-quality neural audio synthesis

Share this with someone who'll enjoy it:

Abstract:Deep generative models applied to audio have improved by a large margin the state-of-the-art in many speech and music related tasks. However, as raw waveform modelling remains an inherently difficult task, audio generative models are either computationally intensive, rely on low sampling rates, are complicated to control or restrict the nature of possible signals. Among those models, Variational AutoEncoders (VAE) give control over the generation by exposing latent variables, although they usually suffer from low synthesis quality. In this paper, we introduce a Realtime Audio Variational autoEncoder (RAVE) allowing both fast and high-quality audio waveform synthesis. We introduce a novel two-stage training procedure, namely representation learning and adversarial fine-tuning. We show that using a post-training analysis of the latent space allows a direct control between the reconstruction fidelity and the representation compactness. By leveraging a multi-band decomposition of the raw waveform, we show that our model is the first able to generate 48kHz audio signals, while simultaneously running 20 times faster than real-time on a standard laptop CPU. We evaluate synthesis quality using both quantitative and qualitative subjective experiments and show the superiority of our approach compared to existing models. Finally, we present applications of our model for timbre transfer and signal compression. All of our source code and audio examples are publicly available.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:RAVE: A variational autoencoder for fast and high-quality neural audio synthesis

Paper and Code