Picture for Long Zhou

Long Zhou

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

Add code
Oct 27, 2024
Viaarxiv icon

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

Add code
Sep 06, 2024
Viaarxiv icon

Autoregressive Speech Synthesis without Vector Quantization

Add code
Jul 11, 2024
Viaarxiv icon

VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

Add code
Jun 12, 2024
Viaarxiv icon

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Add code
Jun 08, 2024
Viaarxiv icon

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Add code
May 28, 2024
Viaarxiv icon

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Add code
Apr 10, 2024
Viaarxiv icon

WavLLM: Towards Robust and Adaptive Speech Large Language Model

Add code
Mar 31, 2024
Viaarxiv icon

Boosting Large Language Model for Speech Synthesis: An Empirical Study

Add code
Dec 30, 2023
Viaarxiv icon

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

Add code
Sep 25, 2023
Viaarxiv icon