Picture for Yanmin Qian

Yanmin Qian

SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation

Add code
Jan 01, 2025
Figure 1 for SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
Figure 2 for SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
Figure 3 for SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
Figure 4 for SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
Viaarxiv icon

Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling

Add code
Dec 19, 2024
Figure 1 for Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
Figure 2 for Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
Figure 3 for Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
Figure 4 for Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
Viaarxiv icon

Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification

Add code
Dec 02, 2024
Viaarxiv icon

Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification

Add code
Oct 22, 2024
Figure 1 for Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
Figure 2 for Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
Figure 3 for Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
Figure 4 for Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
Viaarxiv icon

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction

Add code
Sep 24, 2024
Figure 1 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Figure 2 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Figure 3 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Figure 4 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Viaarxiv icon

Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models

Add code
Sep 11, 2024
Viaarxiv icon

Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion

Add code
Sep 10, 2024
Viaarxiv icon

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching

Add code
Sep 07, 2024
Figure 1 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 2 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 3 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 4 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Viaarxiv icon

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

Add code
Jul 21, 2024
Viaarxiv icon

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement

Add code
Jun 19, 2024
Figure 1 for Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
Figure 2 for Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
Viaarxiv icon