Picture for Yingming Gao

Yingming Gao

SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding

Add code
Sep 18, 2025
Viaarxiv icon

Psy-Copilot: Visual Chain of Thought for Counseling

Add code
Mar 05, 2025
Viaarxiv icon

Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling

Add code
Mar 05, 2025
Figure 1 for Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling
Figure 2 for Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling
Figure 3 for Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling
Figure 4 for Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling
Viaarxiv icon

Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition

Add code
Aug 18, 2024
Viaarxiv icon

SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion

Add code
Jun 09, 2024
Figure 1 for SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Figure 2 for SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Figure 3 for SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Figure 4 for SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Viaarxiv icon

Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

Add code
Jun 06, 2024
Viaarxiv icon

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

Add code
Jun 06, 2024
Figure 1 for Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Figure 2 for Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Figure 3 for Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Viaarxiv icon

Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation

Add code
Jan 02, 2024
Viaarxiv icon

Frame-level emotional state alignment method for speech emotion recognition

Add code
Dec 27, 2023
Viaarxiv icon

CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis

Add code
Dec 16, 2023
Viaarxiv icon