Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Balamurugan Thambiraja

3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing

Dec 01, 2023

Balamurugan Thambiraja, Sadegh Aliakbarian, Darren Cosker, Justus Thies

Figure 1 for 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing

Figure 2 for 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing

Figure 3 for 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing

Figure 4 for 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing

Abstract:We present 3DiFACE, a novel method for personalized speech-driven 3D facial animation and editing. While existing methods deterministically predict facial animations from speech, they overlook the inherent one-to-many relationship between speech and facial expressions, i.e., there are multiple reasonable facial expression animations matching an audio input. It is especially important in content creation to be able to modify generated motion or to specify keyframes. To enable stochasticity as well as motion editing, we propose a lightweight audio-conditioned diffusion model for 3D facial motion. This diffusion model can be trained on a small 3D motion dataset, maintaining expressive lip motion output. In addition, it can be finetuned for specific subjects, requiring only a short video of the person. Through quantitative and qualitative evaluations, we show that our method outperforms existing state-of-the-art techniques and yields speech-driven animations with greater fidelity and diversity.

* Project page: https://balamuruganthambiraja.github.io/3DiFACE/

Via

Access Paper or Ask Questions

Imitator: Personalized Speech-driven 3D Facial Animation

Dec 30, 2022

Balamurugan Thambiraja, Ikhsanul Habibie, Sadegh Aliakbarian, Darren Cosker, Christian Theobalt, Justus Thies

Abstract:Speech-driven 3D facial animation has been widely explored, with applications in gaming, character animation, virtual reality, and telepresence systems. State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor, thus, resulting in unrealistic and inaccurate lip movements. To address this, we present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video and produces novel facial expressions matching the identity-specific speaking style and facial idiosyncrasies of the target actor. Specifically, we train a style-agnostic transformer on a large facial expression dataset which we use as a prior for audio-driven facial expressions. Based on this prior, we optimize for identity-specific speaking style based on a short reference video. To train the prior, we introduce a novel loss function based on detected bilabial consonants to ensure plausible lip closures and consequently improve the realism of the generated expressions. Through detailed experiments and a user study, we show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.

* https://youtu.be/JhXTdjiUCUw

Via

Access Paper or Ask Questions