Picture for Yingming Gao

Yingming Gao

Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition

Add code
Aug 18, 2024
Viaarxiv icon

SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion

Add code
Jun 09, 2024
Viaarxiv icon

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

Add code
Jun 06, 2024
Viaarxiv icon

Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

Add code
Jun 06, 2024
Viaarxiv icon

Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation

Add code
Jan 02, 2024
Viaarxiv icon

Frame-level emotional state alignment method for speech emotion recognition

Add code
Dec 27, 2023
Viaarxiv icon

CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis

Add code
Dec 16, 2023
Viaarxiv icon

Spoken Language Intelligence of Large Language Models for Language Learning

Add code
Aug 28, 2023
Viaarxiv icon

M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis

Add code
May 03, 2023
Viaarxiv icon

A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis

Add code
Oct 07, 2022
Figure 1 for A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis
Figure 2 for A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis
Figure 3 for A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis
Figure 4 for A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis
Viaarxiv icon