Picture for Zhengyang Chen

Zhengyang Chen

Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction

Add code
Feb 11, 2025
Viaarxiv icon

Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification

Add code
Oct 22, 2024
Figure 1 for Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
Figure 2 for Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
Figure 3 for Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
Figure 4 for Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
Viaarxiv icon

Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion

Add code
Sep 10, 2024
Viaarxiv icon

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching

Add code
Sep 07, 2024
Figure 1 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 2 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 3 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 4 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Viaarxiv icon

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

Add code
Jul 21, 2024
Viaarxiv icon

Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems

Add code
Jun 13, 2024
Figure 1 for Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
Figure 2 for Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
Figure 3 for Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
Figure 4 for Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
Viaarxiv icon

Target Speech Diarization with Multimodal Prompts

Add code
Jun 11, 2024
Viaarxiv icon

Prompt-driven Target Speech Diarization

Add code
Oct 23, 2023
Viaarxiv icon

Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

Add code
Sep 27, 2023
Viaarxiv icon

Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer

Add code
Sep 13, 2023
Viaarxiv icon