Picture for Ya Li

Ya Li

Psy-Copilot: Visual Chain of Thought for Counseling

Add code
Mar 05, 2025
Viaarxiv icon

Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling

Add code
Mar 05, 2025
Viaarxiv icon

Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation

Add code
Dec 11, 2024
Viaarxiv icon

Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition

Add code
Aug 18, 2024
Viaarxiv icon

SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion

Add code
Jun 09, 2024
Figure 1 for SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Figure 2 for SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Figure 3 for SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Figure 4 for SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Viaarxiv icon

Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

Add code
Jun 06, 2024
Viaarxiv icon

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

Add code
Jun 06, 2024
Figure 1 for Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Figure 2 for Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Figure 3 for Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Viaarxiv icon

CRB Minimization for RIS-aided mmWave Integrated Sensing and Communications

Add code
Jan 02, 2024
Viaarxiv icon

Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation

Add code
Jan 02, 2024
Viaarxiv icon

Frame-level emotional state alignment method for speech emotion recognition

Add code
Dec 27, 2023
Viaarxiv icon