Picture for Yingming Gao

Yingming Gao

HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios

Add code
Nov 15, 2025
Viaarxiv icon

SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding

Add code
Sep 18, 2025
Figure 1 for SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding
Figure 2 for SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding
Figure 3 for SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding
Figure 4 for SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding
Viaarxiv icon

Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling

Add code
Mar 05, 2025
Figure 1 for Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling
Figure 2 for Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling
Figure 3 for Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling
Figure 4 for Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling
Viaarxiv icon

Psy-Copilot: Visual Chain of Thought for Counseling

Add code
Mar 05, 2025
Viaarxiv icon

Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition

Add code
Aug 18, 2024
Viaarxiv icon

SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion

Add code
Jun 09, 2024
Figure 1 for SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Figure 2 for SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Figure 3 for SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Figure 4 for SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Viaarxiv icon

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

Add code
Jun 06, 2024
Figure 1 for Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Figure 2 for Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Figure 3 for Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Viaarxiv icon

Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

Add code
Jun 06, 2024
Viaarxiv icon

Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation

Add code
Jan 02, 2024
Viaarxiv icon

Frame-level emotional state alignment method for speech emotion recognition

Add code
Dec 27, 2023
Viaarxiv icon