Picture for Wenhao Guan

Wenhao Guan

Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations

Add code
Sep 12, 2024
Viaarxiv icon

Dynamic Language Group-Based MoE: Enhancing Efficiency and Flexibility for Code-Switching Speech Recognition

Add code
Jul 26, 2024
Viaarxiv icon

LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation

Add code
Jun 12, 2024
Viaarxiv icon

LOP-Field: Brain-inspired Layout-Object-Position Fields for Robotic Scene Understanding

Add code
Jun 11, 2024
Viaarxiv icon

FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View

Add code
Mar 05, 2024
Viaarxiv icon

MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis

Add code
Dec 28, 2023
Viaarxiv icon

ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech

Add code
Sep 29, 2023
Viaarxiv icon

Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge

Add code
Jun 07, 2023
Figure 1 for Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Figure 2 for Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Figure 3 for Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Figure 4 for Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Viaarxiv icon