Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement

Sep 03, 2022

Siddarth Ravichandran, Ondřej Texler, Dimitar Dinev, Hyun Jae Kang

Figure 1 for Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement

Figure 2 for Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement

Figure 3 for Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement

Figure 4 for Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement

Share this with someone who'll enjoy it:

Abstract:Over the last few decades, many aspects of human life have been enhanced with virtual domains, from the advent of digital assistants such as Amazon's Alexa and Apple's Siri to the latest metaverse efforts of the rebranded Meta. These trends underscore the importance of generating photorealistic visual depictions of humans. This has led to the rapid growth of so-called deepfake and talking head generation methods in recent years. Despite their impressive results and popularity, they usually lack certain qualitative aspects such as texture quality, lips synchronization, or resolution, and practical aspects such as the ability to run in real-time. To allow for virtual human avatars to be used in practical scenarios, we propose an end-to-end framework for synthesizing high-quality virtual human faces capable of speech with a special emphasis on performance. We introduce a novel network utilizing visemes as an intermediate audio representation and a novel data augmentation strategy employing a hierarchical image synthesis approach that allows disentanglement of the different modalities used to control the global head motion. Our method runs in real-time, and is able to deliver superior results compared to the current state-of-the-art.

View paper on

Share this with someone who'll enjoy it:

Title:Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement

Paper and Code