Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech

Sep 24, 2024

Yunji Chu, Yunseob Shim, Unsang Park

Share this with someone who'll enjoy it:

Abstract:We propose FEIM-TTS, an innovative zero-shot text-to-speech (TTS) model that synthesizes emotionally expressive speech, aligned with facial images and modulated by emotion intensity. Leveraging deep learning, FEIM-TTS transcends traditional TTS systems by interpreting facial cues and adjusting to emotional nuances without dependence on labeled datasets. To address sparse audio-visual-emotional data, the model is trained using LRS3, CREMA-D, and MELD datasets, demonstrating its adaptability. FEIM-TTS's unique capability to produce high-quality, speaker-agnostic speech makes it suitable for creating adaptable voices for virtual characters. Moreover, FEIM-TTS significantly enhances accessibility for individuals with visual impairments or those who have trouble seeing. By integrating emotional nuances into TTS, our model enables dynamic and engaging auditory experiences for webcomics, allowing visually impaired users to enjoy these narratives more fully. Comprehensive evaluation evidences its proficiency in modulating emotion and intensity, advancing emotional speech synthesis and accessibility. Samples are available at: https://feim-tts.github.io/.

* 13 pages, 3 figures, accepted to ECCV Workshop ABAW(Affective Behavior Analysis in-the-wild)7 (to be appear)

View paper on

Share this with someone who'll enjoy it:

Title:Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech

Paper and Code