Picture for Huadai Liu

Huadai Liu

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models

Add code
Dec 13, 2024
Viaarxiv icon

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation

Add code
Oct 16, 2024
Figure 1 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 2 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 3 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 4 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Viaarxiv icon

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Add code
Oct 14, 2024
Figure 1 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 2 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 3 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 4 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Viaarxiv icon

MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

Add code
Jul 18, 2024
Viaarxiv icon

AudioLCM: Text-to-Audio Generation with Latent Consistency Models

Add code
Jun 01, 2024
Viaarxiv icon

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

Add code
May 24, 2023
Viaarxiv icon

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer

Add code
May 22, 2023
Viaarxiv icon

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing

Add code
May 21, 2023
Viaarxiv icon

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis

Add code
May 18, 2023
Viaarxiv icon

ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech

Add code
Jul 13, 2022
Figure 1 for ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Figure 2 for ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Figure 3 for ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Figure 4 for ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Viaarxiv icon