Picture for Xinsheng Wang

Xinsheng Wang

Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought

Add code
Feb 25, 2025
Viaarxiv icon

Audio-FLAN: A Preliminary Release

Add code
Feb 23, 2025
Viaarxiv icon

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Add code
Feb 06, 2025
Viaarxiv icon

CosyAudio: Improving Audio Generation with Confidence Scores and Synthetic Captions

Add code
Jan 28, 2025
Viaarxiv icon

EDSep: An Effective Diffusion-Based Method for Speech Source Separation

Add code
Jan 27, 2025
Figure 1 for EDSep: An Effective Diffusion-Based Method for Speech Source Separation
Figure 2 for EDSep: An Effective Diffusion-Based Method for Speech Source Separation
Figure 3 for EDSep: An Effective Diffusion-Based Method for Speech Source Separation
Viaarxiv icon

FleSpeech: Flexibly Controllable Speech Generation with Various Prompts

Add code
Jan 08, 2025
Figure 1 for FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
Figure 2 for FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
Figure 3 for FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
Figure 4 for FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
Viaarxiv icon

StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion

Add code
Aug 05, 2024
Figure 1 for StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
Figure 2 for StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
Figure 3 for StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
Viaarxiv icon

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection

Add code
Jun 12, 2024
Viaarxiv icon

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

Add code
Feb 07, 2024
Viaarxiv icon

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

Add code
Sep 03, 2023
Figure 1 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 2 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 3 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 4 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Viaarxiv icon