Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

Jul 10, 2022

Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Tomoki Toda

Figure 1 for A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

Figure 2 for A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

Figure 3 for A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

Figure 4 for A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

Share this with someone who'll enjoy it:

Abstract:We present a large-scale comparative study of self-supervised speech representation (S3R)-based voice conversion (VC). In the context of recognition-synthesis VC, S3Rs are attractive owing to their potential to replace expensive supervised representations such as phonetic posteriorgrams (PPGs), which are commonly adopted by state-of-the-art VC systems. Using S3PRL-VC, an open-source VC software we previously developed, we provide a series of in-depth objective and subjective analyses under three VC settings: intra-/cross-lingual any-to-one (A2O) and any-to-any (A2A) VC, using the voice conversion challenge 2020 (VCC2020) dataset. We investigated S3R-based VC in various aspects, including model type, multilinguality, and supervision. We also studied the effect of a post-discretization process with k-means clustering and showed how it improves in the A2A setting. Finally, the comparison with state-of-the-art VC systems demonstrates the competitiveness of S3R-based VC and also sheds light on the possible improving directions.

* Accepted to IEEE Journal of Selected Topics in Signal Processing. arXiv admin note: substantial text overlap with arXiv:2110.06280

View paper on

Share this with someone who'll enjoy it:

Title:A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

Paper and Code