Picture for Xinfa Zhu

Xinfa Zhu

ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training

Add code
Jan 08, 2025
Viaarxiv icon

Autoregressive Speech Synthesis with Next-Distribution Prediction

Add code
Dec 22, 2024
Viaarxiv icon

YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls

Add code
Dec 12, 2024
Figure 1 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 2 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 3 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 4 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Viaarxiv icon

CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion

Add code
Dec 03, 2024
Figure 1 for CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
Figure 2 for CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
Figure 3 for CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
Figure 4 for CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
Viaarxiv icon

The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge

Add code
Oct 31, 2024
Viaarxiv icon

Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy

Add code
Jun 14, 2024
Viaarxiv icon

Text-aware and Context-aware Expressive Audiobook Speech Synthesis

Add code
Jun 12, 2024
Viaarxiv icon

Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

Add code
Jun 11, 2024
Viaarxiv icon

Accent-VITS:accent transfer for end-to-end TTS

Add code
Dec 29, 2023
Figure 1 for Accent-VITS:accent transfer for end-to-end TTS
Figure 2 for Accent-VITS:accent transfer for end-to-end TTS
Figure 3 for Accent-VITS:accent transfer for end-to-end TTS
Viaarxiv icon

SELM: Speech Enhancement Using Discrete Tokens and Language Models

Add code
Dec 15, 2023
Figure 1 for SELM: Speech Enhancement Using Discrete Tokens and Language Models
Figure 2 for SELM: Speech Enhancement Using Discrete Tokens and Language Models
Figure 3 for SELM: Speech Enhancement Using Discrete Tokens and Language Models
Figure 4 for SELM: Speech Enhancement Using Discrete Tokens and Language Models
Viaarxiv icon