Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Lightweight Zero-shot Text-to-Speech with Mixture of Adapters

Jul 01, 2024

Kenichi Fujita, Takanori Ashihara, Marc Delcroix, Yusuke Ijima

Figure 1 for Lightweight Zero-shot Text-to-Speech with Mixture of Adapters

Figure 2 for Lightweight Zero-shot Text-to-Speech with Mixture of Adapters

Figure 3 for Lightweight Zero-shot Text-to-Speech with Mixture of Adapters

Figure 4 for Lightweight Zero-shot Text-to-Speech with Mixture of Adapters

Share this with someone who'll enjoy it:

Abstract:The advancements in zero-shot text-to-speech (TTS) methods, based on large-scale models, have demonstrated high fidelity in reproducing speaker characteristics. However, these models are too large for practical daily use. We propose a lightweight zero-shot TTS method using a mixture of adapters (MoA). Our proposed method incorporates MoA modules into the decoder and the variance adapter of a non-autoregressive TTS model. These modules enhance the ability to adapt a wide variety of speakers in a zero-shot manner by selecting appropriate adapters associated with speaker characteristics on the basis of speaker embeddings. Our method achieves high-quality speech synthesis with minimal additional parameters. Through objective and subjective evaluations, we confirmed that our method achieves better performance than the baseline with less than 40\% of parameters at 1.9 times faster inference speed. Audio samples are available on our demo page (https://ntt-hilab-gensp.github.io/is2024lightweightTTS/).

* 5 pages,3 figures, Accepted to INTERSPEECH 2024

View paper on

Share this with someone who'll enjoy it:

Title:Lightweight Zero-shot Text-to-Speech with Mixture of Adapters

Paper and Code