Picture for Jilong Wu

Jilong Wu

Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation

Add code
Oct 27, 2024
Viaarxiv icon

Self-Supervised Representations for Singing Voice Conversion

Add code
Mar 21, 2023
Viaarxiv icon

Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition

Add code
Mar 01, 2023
Viaarxiv icon

Voice-preserving Zero-shot Multiple Accent Conversion

Add code
Nov 23, 2022
Viaarxiv icon

Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders

Add code
Oct 28, 2022
Viaarxiv icon

VocBench: A Neural Vocoder Benchmark for Speech Synthesis

Add code
Dec 06, 2021
Figure 1 for VocBench: A Neural Vocoder Benchmark for Speech Synthesis
Figure 2 for VocBench: A Neural Vocoder Benchmark for Speech Synthesis
Figure 3 for VocBench: A Neural Vocoder Benchmark for Speech Synthesis
Viaarxiv icon

Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling

Add code
Apr 01, 2021
Figure 1 for Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling
Figure 2 for Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling
Figure 3 for Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling
Figure 4 for Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling
Viaarxiv icon