Picture for Xinsheng Wang

Xinsheng Wang

EDSep: An Effective Diffusion-Based Method for Speech Source Separation

Add code
Jan 27, 2025
Viaarxiv icon

FleSpeech: Flexibly Controllable Speech Generation with Various Prompts

Add code
Jan 08, 2025
Viaarxiv icon

StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion

Add code
Aug 05, 2024
Figure 1 for StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
Figure 2 for StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
Figure 3 for StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
Viaarxiv icon

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection

Add code
Jun 12, 2024
Viaarxiv icon

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

Add code
Feb 07, 2024
Viaarxiv icon

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

Add code
Sep 03, 2023
Figure 1 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 2 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 3 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 4 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Viaarxiv icon

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis

Add code
Dec 06, 2022
Viaarxiv icon

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints

Add code
Nov 16, 2022
Viaarxiv icon

Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

Add code
Nov 02, 2022
Viaarxiv icon

Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis

Add code
Jul 04, 2022
Figure 1 for Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Figure 2 for Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Figure 3 for Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Figure 4 for Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Viaarxiv icon