Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Muhammad Hamza Mughal

ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

Mar 26, 2024

Muhammad Hamza Mughal, Rishabh Dabral, Ikhsanul Habibie, Lucia Donatelli, Marc Habermann, Christian Theobalt

Figure 1 for ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

Figure 2 for ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

Figure 3 for ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

Figure 4 for ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

Abstract:Gestures play a key role in human communication. Recent methods for co-speech gesture generation, while managing to generate beat-aligned motions, struggle generating gestures that are semantically aligned with the utterance. Compared to beat gestures that align naturally to the audio signal, semantically coherent gestures require modeling the complex interactions between the language and human motion, and can be controlled by focusing on certain words. Therefore, we present ConvoFusion, a diffusion-based approach for multi-modal gesture synthesis, which can not only generate gestures based on multi-modal speech inputs, but can also facilitate controllability in gesture synthesis. Our method proposes two guidance objectives that allow the users to modulate the impact of different conditioning modalities (e.g. audio vs text) as well as to choose certain words to be emphasized during gesturing. Our method is versatile in that it can be trained either for generating monologue gestures or even the conversational gestures. To further advance the research on multi-party interactive gestures, the DnD Group Gesture dataset is released, which contains 6 hours of gesture data showing 5 people interacting with one another. We compare our method with several recent works and demonstrate effectiveness of our method on a variety of tasks. We urge the reader to watch our supplementary video at our website.

* CVPR 2024. Project Page: https://vcai.mpi-inf.mpg.de/projects/ConvoFusion/

Via

Access Paper or Ask Questions

MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis

Dec 08, 2022

Rishabh Dabral, Muhammad Hamza Mughal, Vladislav Golyanik, Christian Theobalt

Figure 1 for MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis

Figure 2 for MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis

Figure 3 for MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis

Figure 4 for MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis

Abstract:Conventional methods for human motion synthesis are either deterministic or struggle with the trade-off between motion diversity and motion quality. In response to these limitations, we introduce MoFusion, i.e., a new denoising-diffusion-based framework for high-quality conditional human motion synthesis that can generate long, temporally plausible, and semantically accurate motions based on a range of conditioning contexts (such as music and text). We also present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework through our scheduled weighting strategy. The learned latent space can be used for several interactive motion editing applications -- like inbetweening, seed conditioning, and text-based editing -- thus, providing crucial abilities for virtual character animation and robotics. Through comprehensive quantitative evaluations and a perceptual user study, we demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature. We urge the reader to watch our supplementary video and visit https://vcai.mpi-inf.mpg.de/projects/MoFusion.

* 11 pages, 6 figures, 2 tables; project page: https://vcai.mpi-inf.mpg.de/projects/MoFusion

Via

Access Paper or Ask Questions