Picture for Rilin Chen

Rilin Chen

STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

Add code
Sep 13, 2024
Figure 1 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Figure 2 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Figure 3 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Figure 4 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Viaarxiv icon

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

Add code
Sep 11, 2024
Figure 1 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Figure 2 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Figure 3 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Figure 4 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Viaarxiv icon

LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation

Add code
Aug 22, 2024
Viaarxiv icon

Video-to-Audio Generation with Hidden Alignment

Add code
Jul 10, 2024
Figure 1 for Video-to-Audio Generation with Hidden Alignment
Figure 2 for Video-to-Audio Generation with Hidden Alignment
Figure 3 for Video-to-Audio Generation with Hidden Alignment
Figure 4 for Video-to-Audio Generation with Hidden Alignment
Viaarxiv icon

SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression

Add code
Jun 17, 2024
Viaarxiv icon

LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance

Add code
Jun 08, 2024
Viaarxiv icon

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Add code
Sep 04, 2023
Viaarxiv icon