Picture for Xinfa Zhu

Xinfa Zhu

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Add code
Feb 06, 2025
Viaarxiv icon

CosyAudio: Improving Audio Generation with Confidence Scores and Synthetic Captions

Add code
Jan 28, 2025
Viaarxiv icon

ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training

Add code
Jan 08, 2025
Figure 1 for ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
Figure 2 for ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
Figure 3 for ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
Figure 4 for ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
Viaarxiv icon

Autoregressive Speech Synthesis with Next-Distribution Prediction

Add code
Dec 22, 2024
Viaarxiv icon

YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls

Add code
Dec 12, 2024
Figure 1 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 2 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 3 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 4 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Viaarxiv icon

CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion

Add code
Dec 03, 2024
Figure 1 for CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
Figure 2 for CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
Figure 3 for CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
Figure 4 for CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
Viaarxiv icon

The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge

Add code
Oct 31, 2024
Viaarxiv icon

Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy

Add code
Jun 14, 2024
Viaarxiv icon

Text-aware and Context-aware Expressive Audiobook Speech Synthesis

Add code
Jun 12, 2024
Viaarxiv icon

Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

Add code
Jun 11, 2024
Viaarxiv icon