Picture for Dongchao Yang

Dongchao Yang

A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models

Add code
Nov 13, 2024
Viaarxiv icon

Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models

Add code
Sep 21, 2024
Figure 1 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 2 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 3 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 4 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Viaarxiv icon

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation

Add code
Sep 18, 2024
Viaarxiv icon

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis

Add code
Sep 02, 2024
Viaarxiv icon

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

Add code
Jun 14, 2024
Figure 1 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 2 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 3 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 4 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Viaarxiv icon

CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction

Add code
Jun 12, 2024
Figure 1 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Figure 2 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Figure 3 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Figure 4 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Viaarxiv icon

Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder

Add code
Jun 05, 2024
Viaarxiv icon

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models

Add code
Jun 04, 2024
Figure 1 for SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Figure 2 for SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Figure 3 for SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Figure 4 for SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Viaarxiv icon

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Add code
Apr 06, 2024
Viaarxiv icon

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Add code
Mar 05, 2024
Viaarxiv icon