Picture for Ziyue Jiang

Ziyue Jiang

WavChat: A Survey of Spoken Dialogue Models

Add code
Nov 26, 2024
Viaarxiv icon

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes

Add code
Oct 09, 2024
Figure 1 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Figure 2 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Figure 3 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Figure 4 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Viaarxiv icon

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

Add code
Sep 26, 2024
Viaarxiv icon

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

Add code
Sep 26, 2024
Figure 1 for GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Figure 2 for GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Figure 3 for GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Figure 4 for GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Viaarxiv icon

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Add code
Aug 29, 2024
Figure 1 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 2 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 3 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 4 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Viaarxiv icon

MulliVC: Multi-lingual Voice Conversion With Cycle Consistency

Add code
Aug 08, 2024
Viaarxiv icon

MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis

Add code
Jul 19, 2024
Figure 1 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Figure 2 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Figure 3 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Figure 4 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Viaarxiv icon

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

Add code
Jun 03, 2024
Viaarxiv icon

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

Add code
Feb 20, 2024
Viaarxiv icon

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech

Add code
Feb 14, 2024
Viaarxiv icon