Picture for Sitong Cheng

Sitong Cheng

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Add code
Mar 03, 2025
Viaarxiv icon

Audio-FLAN: A Preliminary Release

Add code
Feb 23, 2025
Viaarxiv icon

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Add code
Oct 14, 2024
Figure 1 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 2 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 3 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 4 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Viaarxiv icon

CN-CELEB: a challenging Chinese speaker recognition dataset

Add code
Oct 31, 2019
Figure 1 for CN-CELEB: a challenging Chinese speaker recognition dataset
Figure 2 for CN-CELEB: a challenging Chinese speaker recognition dataset
Figure 3 for CN-CELEB: a challenging Chinese speaker recognition dataset
Figure 4 for CN-CELEB: a challenging Chinese speaker recognition dataset
Viaarxiv icon