Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huali Zhou

KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario

Mar 20, 2024

Huali Zhou, Yuke Lin, Dong Liu, Ming Li

Figure 1 for KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario

Figure 2 for KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario

Figure 3 for KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario

Figure 4 for KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario

Abstract:This work aims to promote Chinese opera research in both musical and speech domains, with a primary focus on overcoming the data limitations. We introduce KunquDB, a relatively large-scale, well-annotated audio-visual dataset comprising 339 speakers and 128 hours of content. Originating from the Kunqu Opera Art Canon (Kunqu yishu dadian), KunquDB is meticulously structured by dialogue lines, providing explicit annotations including character names, speaker names, gender information, vocal manner classifications, and accompanied by preliminary text transcriptions. KunquDB provides a versatile foundation for role-centric acoustic studies and advancements in speech-related research, including Automatic Speaker Verification (ASV). Beyond enriching opera research, this dataset bridges the gap between artistic expression and technological innovation. Pioneering the exploration of ASV in Chinese opera, we construct four test trials considering two distinct vocal manners in opera voices: stage speech (ST) and singing (S). Implementing domain adaptation methods effectively mitigates domain mismatches induced by these vocal manner variations while there is still room for further improvement as a benchmark.

Via

Access Paper or Ask Questions

BiSinger: Bilingual Singing Voice Synthesis

Sep 29, 2023

Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, Ming Li

Figure 1 for BiSinger: Bilingual Singing Voice Synthesis

Figure 2 for BiSinger: Bilingual Singing Voice Synthesis

Figure 3 for BiSinger: Bilingual Singing Voice Synthesis

Figure 4 for BiSinger: Bilingual Singing Voice Synthesis

Abstract:Although Singing Voice Synthesis (SVS) has made great strides with Text-to-Speech (TTS) techniques, multilingual singing voice modeling remains relatively unexplored. This paper presents BiSinger, a bilingual pop SVS system for English and Chinese Mandarin. Current systems require separate models per language and cannot accurately represent both Chinese and English, hindering code-switch SVS. To address this gap, we design a shared representation between Chinese and English singing voices, achieved by using the CMU dictionary with mapping rules. We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data. Experiments affirm that our language-independent representation and incorporation of related datasets enable a single model with enhanced performance in English and code-switch SVS while maintaining Chinese song performance. Audio samples are available at https://bisinger-svs.github.io.

* Accepted by ASRU2023

Via

Access Paper or Ask Questions