Picture for Dongchao Yang

Dongchao Yang

MoonCast: High-Quality Zero-Shot Podcast Generation

Add code
Mar 19, 2025
Viaarxiv icon

InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training

Add code
Mar 04, 2025
Viaarxiv icon

Audio-FLAN: A Preliminary Release

Add code
Feb 23, 2025
Viaarxiv icon

ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors

Add code
Feb 22, 2025
Viaarxiv icon

A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models

Add code
Nov 13, 2024
Viaarxiv icon

Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models

Add code
Sep 21, 2024
Figure 1 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 2 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 3 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 4 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Viaarxiv icon

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation

Add code
Sep 18, 2024
Viaarxiv icon

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis

Add code
Sep 02, 2024
Figure 1 for SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis
Figure 2 for SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis
Figure 3 for SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis
Figure 4 for SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis
Viaarxiv icon

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

Add code
Jun 14, 2024
Figure 1 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 2 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 3 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 4 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Viaarxiv icon

CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction

Add code
Jun 12, 2024
Figure 1 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Figure 2 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Figure 3 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Figure 4 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Viaarxiv icon