Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Jun 01, 2023

Hubert Siuzdak

Figure 1 for Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Figure 2 for Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Figure 3 for Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Figure 4 for Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Share this with someone who'll enjoy it:

Abstract:Recent advancements in neural vocoding are predominantly driven by Generative Adversarial Networks (GANs) operating in the time-domain. While effective, this approach neglects the inductive bias offered by time-frequency representations, resulting in reduntant and computionally-intensive upsampling operations. Fourier-based time-frequency representation is an appealing alternative, aligning more accurately with human auditory perception, and benefitting from well-established fast algorithms for its computation. Nevertheless, direct reconstruction of complex-valued spectrograms has been historically problematic, primarily due to phase recovery issues. This study seeks to close this gap by presenting Vocos, a new model that addresses the key challenges of modeling spectral coefficients. Vocos demonstrates improved computational efficiency, achieving an order of magnitude increase in speed compared to prevailing time-domain neural vocoding approaches. As shown by objective evaluation, Vocos not only matches state-of-the-art audio quality, but thanks to frequency-aware generator, also effectively mitigates the periodicity issues frequently associated with time-domain GANs. The source code and model weights have been open-sourced at https://github.com/charactr-platform/vocos.

View paper on

Share this with someone who'll enjoy it:

Title:Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Paper and Code