Picture for Zhengxi Liu

Zhengxi Liu

ARCHI-TTS: A flow-matching-based Text-to-Speech Model with Self-supervised Semantic Aligner and Accelerated Inference

Add code
Feb 05, 2026
Viaarxiv icon

Improving Audio Generation with Visual Enhanced Caption

Add code
Jul 05, 2024
Figure 1 for Improving Audio Generation with Visual Enhanced Caption
Figure 2 for Improving Audio Generation with Visual Enhanced Caption
Figure 3 for Improving Audio Generation with Visual Enhanced Caption
Figure 4 for Improving Audio Generation with Visual Enhanced Caption
Viaarxiv icon

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Add code
Jun 04, 2024
Figure 1 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 2 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 3 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 4 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Viaarxiv icon

Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech

Add code
Jul 13, 2022
Figure 1 for Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Figure 2 for Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Figure 3 for Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Figure 4 for Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Viaarxiv icon

Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition

Add code
Jun 25, 2021
Figure 1 for Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition
Figure 2 for Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition
Figure 3 for Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition
Figure 4 for Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition
Viaarxiv icon