Picture for Shivam Mehta

Shivam Mehta

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech

Add code
Jun 08, 2024
Viaarxiv icon

Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis

Add code
Apr 30, 2024
Viaarxiv icon

Unified speech and gesture synthesis using flow matching

Add code
Oct 08, 2023
Viaarxiv icon

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

Add code
Sep 11, 2023
Viaarxiv icon

Matcha-TTS: A fast TTS architecture with conditional flow matching

Add code
Sep 06, 2023
Viaarxiv icon

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

Add code
Jun 15, 2023
Figure 1 for Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Figure 2 for Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Figure 3 for Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Viaarxiv icon

Prosody-controllable spontaneous TTS with neural HMMs

Add code
Nov 24, 2022
Viaarxiv icon

OverFlow: Putting flows on top of neural transducers for better TTS

Add code
Nov 13, 2022
Viaarxiv icon

Neural HMMs are all you need (for high-quality attention-free TTS)

Add code
Sep 03, 2021
Figure 1 for Neural HMMs are all you need (for high-quality attention-free TTS)
Figure 2 for Neural HMMs are all you need (for high-quality attention-free TTS)
Viaarxiv icon