Picture for Xie Chen

Xie Chen

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

Add code
Jan 20, 2026
Viaarxiv icon

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon

ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis

Add code
Jan 07, 2026
Viaarxiv icon

Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

Add code
Jan 06, 2026
Viaarxiv icon

Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis

Add code
Dec 21, 2025
Viaarxiv icon

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models

Add code
Oct 26, 2025
Figure 1 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 2 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 3 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 4 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Viaarxiv icon

The Universal Landscape of Human Reasoning

Add code
Oct 24, 2025
Figure 1 for The Universal Landscape of Human Reasoning
Figure 2 for The Universal Landscape of Human Reasoning
Figure 3 for The Universal Landscape of Human Reasoning
Figure 4 for The Universal Landscape of Human Reasoning
Viaarxiv icon

Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception

Add code
Oct 14, 2025
Viaarxiv icon

Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models

Add code
Oct 10, 2025
Viaarxiv icon

Towards Responsible Evaluation for Text-to-Speech

Add code
Oct 08, 2025
Figure 1 for Towards Responsible Evaluation for Text-to-Speech
Figure 2 for Towards Responsible Evaluation for Text-to-Speech
Figure 3 for Towards Responsible Evaluation for Text-to-Speech
Figure 4 for Towards Responsible Evaluation for Text-to-Speech
Viaarxiv icon