Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Let's face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings

Jun 11, 2020

Patrik Jonell, Taras Kucherenko, Gustav Eje Henter, Jonas Beskow

Figure 1 for Let's face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings

Figure 2 for Let's face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings

Figure 3 for Let's face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings

Figure 4 for Let's face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings

Share this with someone who'll enjoy it:

Abstract:To enable more natural face-to-face interactions, conversational agents need to adapt their behavior to their interlocutors. One key aspect of this is generation of appropriate non-verbal behavior for the agent, for example facial gestures, here defined as facial expressions and head movements. Most existing gesture-generating systems do not utilize multi-modal cues from the interlocutor when synthesizing non-verbal behavior. Those that do, typically use deterministic methods that risk producing repetitive and non-vivid motions. In this paper, we introduce a probabilistic method to synthesize interlocutor-aware facial gestures - represented by highly expressive FLAME parameters - in dyadic conversations. Our contributions are: a) a method for feature extraction from multi-party video and speech recordings, resulting in a representation that allows for independent control and manipulation of expression and speech articulation in a 3D avatar; b) an extension to MoGlow, a recent motion-synthesis method based on normalizing flows, to also take multi-modal signals from the interlocutor as input and subsequently output interlocutor-aware facial gestures; and c) subjective and objective experiments assessing the use and relative importance of the different modalities in the synthesized output. The results show that the model successfully leverages the input from the interlocutor to generate more appropriate behavior.

View paper on

Share this with someone who'll enjoy it:

Title:Let's face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings

Paper and Code