Picture for Lirong Dai

Lirong Dai

CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder

Add code
Dec 12, 2024
Figure 1 for CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder
Figure 2 for CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder
Figure 3 for CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder
Figure 4 for CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder
Viaarxiv icon

SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model

Add code
Oct 16, 2024
Figure 1 for SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model
Figure 2 for SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model
Figure 3 for SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model
Figure 4 for SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model
Viaarxiv icon

Deep CLAS: Deep Contextual Listen, Attend and Spell

Add code
Sep 26, 2024
Figure 1 for Deep CLAS: Deep Contextual Listen, Attend and Spell
Figure 2 for Deep CLAS: Deep Contextual Listen, Attend and Spell
Figure 3 for Deep CLAS: Deep Contextual Listen, Attend and Spell
Figure 4 for Deep CLAS: Deep Contextual Listen, Attend and Spell
Viaarxiv icon

LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation

Add code
Aug 22, 2024
Viaarxiv icon

LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance

Add code
Jun 08, 2024
Figure 1 for LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
Figure 2 for LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
Figure 3 for LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
Viaarxiv icon

Adversarial speech for voice privacy protection from Personalized Speech generation

Add code
Jan 22, 2024
Viaarxiv icon

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

Add code
Jan 07, 2024
Viaarxiv icon

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Add code
Sep 04, 2023
Figure 1 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Figure 2 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Figure 3 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Figure 4 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Viaarxiv icon

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Add code
Nov 21, 2022
Figure 1 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 2 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 3 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 4 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Viaarxiv icon

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

Add code
Oct 07, 2022
Figure 1 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Figure 2 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Figure 3 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Figure 4 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Viaarxiv icon