Picture for Xinsheng Wang

Xinsheng Wang

StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion

Add code
Aug 05, 2024
Figure 1 for StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
Figure 2 for StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
Figure 3 for StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
Viaarxiv icon

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection

Add code
Jun 12, 2024
Viaarxiv icon

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

Add code
Feb 07, 2024
Viaarxiv icon

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

Add code
Sep 03, 2023
Figure 1 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 2 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 3 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 4 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Viaarxiv icon

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis

Add code
Dec 06, 2022
Viaarxiv icon

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints

Add code
Nov 16, 2022
Viaarxiv icon

Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

Add code
Nov 02, 2022
Viaarxiv icon

Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis

Add code
Jul 04, 2022
Figure 1 for Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Figure 2 for Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Figure 3 for Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Figure 4 for Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Viaarxiv icon

AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation

Add code
Jun 01, 2022
Figure 1 for AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Figure 2 for AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Figure 3 for AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Figure 4 for AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Viaarxiv icon

Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher

Add code
Mar 30, 2022
Figure 1 for Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher
Figure 2 for Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher
Figure 3 for Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher
Figure 4 for Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher
Viaarxiv icon