Picture for Xinfa Zhu

Xinfa Zhu

The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge

Add code
Oct 31, 2024
Viaarxiv icon

Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy

Add code
Jun 14, 2024
Viaarxiv icon

Text-aware and Context-aware Expressive Audiobook Speech Synthesis

Add code
Jun 12, 2024
Viaarxiv icon

Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

Add code
Jun 11, 2024
Viaarxiv icon

Accent-VITS:accent transfer for end-to-end TTS

Add code
Dec 29, 2023
Figure 1 for Accent-VITS:accent transfer for end-to-end TTS
Figure 2 for Accent-VITS:accent transfer for end-to-end TTS
Figure 3 for Accent-VITS:accent transfer for end-to-end TTS
Viaarxiv icon

SELM: Speech Enhancement Using Discrete Tokens and Language Models

Add code
Dec 15, 2023
Figure 1 for SELM: Speech Enhancement Using Discrete Tokens and Language Models
Figure 2 for SELM: Speech Enhancement Using Discrete Tokens and Language Models
Figure 3 for SELM: Speech Enhancement Using Discrete Tokens and Language Models
Figure 4 for SELM: Speech Enhancement Using Discrete Tokens and Language Models
Viaarxiv icon

SponTTS: modeling and transferring spontaneous style for TTS

Add code
Nov 13, 2023
Viaarxiv icon

Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning

Add code
Oct 26, 2023
Figure 1 for Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning
Figure 2 for Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning
Figure 3 for Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning
Figure 4 for Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning
Viaarxiv icon

Vec-Tok Speech: speech vectorization and tokenization for neural speech generation

Add code
Oct 12, 2023
Viaarxiv icon

U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning

Add code
Oct 06, 2023
Figure 1 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Figure 2 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Figure 3 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Figure 4 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Viaarxiv icon