Picture for Fenglong Xie

Fenglong Xie

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation

Add code
Sep 18, 2024
Viaarxiv icon

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis

Add code
Sep 02, 2024
Viaarxiv icon

Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder

Add code
Jun 05, 2024
Viaarxiv icon

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

Add code
Aug 31, 2023
Viaarxiv icon

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

Add code
Oct 27, 2022
Viaarxiv icon

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

Add code
Sep 22, 2022
Figure 1 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Figure 2 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Figure 3 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Figure 4 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Viaarxiv icon

Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS

Add code
Sep 28, 2021
Figure 1 for Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Figure 2 for Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Figure 3 for Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Figure 4 for Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Viaarxiv icon

Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet

Add code
Feb 09, 2021
Figure 1 for Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet
Figure 2 for Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet
Figure 3 for Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet
Figure 4 for Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet
Viaarxiv icon