Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over

Oct 09, 2021

Junchen Lu, Berrak Sisman, Rui Liu, Mingyang Zhang, Haizhou Li

Figure 1 for VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over

Figure 2 for VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over

Figure 3 for VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over

Share this with someone who'll enjoy it:

Abstract:In this paper, we formulate a novel task to synthesize speech in sync with a silent pre-recorded video, denoted as automatic voice over (AVO). Unlike traditional speech synthesis, AVO seeks to generate not only human-sounding speech, but also perfect lip-speech synchronization. A natural solution to AVO is to condition the speech rendering on the temporal progression of lip sequence in the video. We propose a novel text-to-speech model that is conditioned on visual input, named VisualTTS, for accurate lip-speech synchronization. The proposed VisualTTS adopts two novel mechanisms that are 1) textual-visual attention, and 2) visual fusion strategy during acoustic decoding, which both contribute to forming accurate alignment between the input text content and lip motion in input lip sequence. Experimental results show that VisualTTS achieves accurate lip-speech synchronization and outperforms all baseline systems.

* Submitted to ICASSP 2022

View paper on

Share this with someone who'll enjoy it:

Title:VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over

Paper and Code