Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sai Tanmay Reddy Chakkera

JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation

Sep 18, 2024

Sai Tanmay Reddy Chakkera, Aggelina Chatziagapi, Dimitris Samaras

Figure 1 for JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation

Figure 2 for JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation

Figure 3 for JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation

Figure 4 for JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation

Abstract:We introduce a novel method for joint expression and audio-guided talking face generation. Recent approaches either struggle to preserve the speaker identity or fail to produce faithful facial expressions. To address these challenges, we propose a NeRF-based network. Since we train our network on monocular videos without any ground truth, it is essential to learn disentangled representations for audio and expression. We first learn audio features in a self-supervised manner, given utterances from multiple subjects. By incorporating a contrastive learning technique, we ensure that the learned audio features are aligned to the lip motion and disentangled from the muscle motion of the rest of the face. We then devise a transformer-based architecture that learns expression features, capturing long-range facial expressions and disentangling them from the speech-specific mouth movements. Through quantitative and qualitative evaluation, we demonstrate that our method can synthesize high-fidelity talking face videos, achieving state-of-the-art facial expression transfer along with lip synchronization to unseen audio.

* Accepted by BMVC 2024. Project Page: https://starc52.github.io/publications/2024-07-19-JEAN

Via

Access Paper or Ask Questions