Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shu-Wen Yang

A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

Jul 10, 2022

Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Tomoki Toda

Figure 1 for A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

Figure 2 for A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

Figure 3 for A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

Figure 4 for A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

Abstract:We present a large-scale comparative study of self-supervised speech representation (S3R)-based voice conversion (VC). In the context of recognition-synthesis VC, S3Rs are attractive owing to their potential to replace expensive supervised representations such as phonetic posteriorgrams (PPGs), which are commonly adopted by state-of-the-art VC systems. Using S3PRL-VC, an open-source VC software we previously developed, we provide a series of in-depth objective and subjective analyses under three VC settings: intra-/cross-lingual any-to-one (A2O) and any-to-any (A2A) VC, using the voice conversion challenge 2020 (VCC2020) dataset. We investigated S3R-based VC in various aspects, including model type, multilinguality, and supervision. We also studied the effect of a post-discretization process with k-means clustering and showed how it improves in the A2A setting. Finally, the comparison with state-of-the-art VC systems demonstrates the competitiveness of S3R-based VC and also sheds light on the possible improving directions.

* Accepted to IEEE Journal of Selected Topics in Signal Processing. arXiv admin note: substantial text overlap with arXiv:2110.06280

Via

Access Paper or Ask Questions

S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

Oct 12, 2021

Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda

Figure 1 for S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

Figure 2 for S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

Figure 3 for S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

Figure 4 for S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

Abstract:This paper introduces S3PRL-VC, an open-source voice conversion (VC) framework based on the S3PRL toolkit. In the context of recognition-synthesis VC, self-supervised speech representation (S3R) is valuable in its potential to replace the expensive supervised representation adopted by state-of-the-art VC systems. Moreover, we claim that VC is a good probing task for S3R analysis. In this work, we provide a series of in-depth analyses by benchmarking on the two tasks in VCC2020, namely intra-/cross-lingual any-to-one (A2O) VC, as well as an any-to-any (A2A) setting. We also provide comparisons between not only different S3Rs but also top systems in VCC2020 with supervised representations. Systematic objective and subjective evaluation were conducted, and we show that S3R is comparable with VCC2020 top systems in the A2O setting in terms of similarity, and achieves state-of-the-art in S3R-based A2A VC. We believe the extensive analysis, as well as the toolkit itself, contribute to not only the S3R community but also the VC community. The codebase is now open-sourced.

* Submitted to ICASSP 2022. Code available at: https://github.com/s3prl/s3prl/tree/master/s3prl/downstream/a2o-vc-vcc2020

Via

Access Paper or Ask Questions