Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Listen, denoise, action! Audio-driven motion synthesis with diffusion models

Nov 17, 2022

Simon Alexanderson, Rajmund Nagy, Jonas Beskow, Gustav Eje Henter

Figure 1 for Listen, denoise, action! Audio-driven motion synthesis with diffusion models

Figure 2 for Listen, denoise, action! Audio-driven motion synthesis with diffusion models

Figure 3 for Listen, denoise, action! Audio-driven motion synthesis with diffusion models

Figure 4 for Listen, denoise, action! Audio-driven motion synthesis with diffusion models

Share this with someone who'll enjoy it:

Abstract:Diffusion models have experienced a surge of interest as highly expressive yet efficiently trainable probabilistic models. We show that these models are an excellent fit for synthesising human motion that co-occurs with audio, for example co-speech gesticulation, since motion is complex and highly ambiguous given audio, calling for a probabilistic description. Specifically, we adapt the DiffWave architecture to model 3D pose sequences, putting Conformers in place of dilated convolutions for improved accuracy. We also demonstrate control over motion style, using classifier-free guidance to adjust the strength of the stylistic expression. Gesture-generation experiments on the Trinity Speech-Gesture and ZeroEGGS datasets confirm that the proposed method achieves top-of-the-line motion quality, with distinctive styles whose expression can be made more or less pronounced. We also synthesise dance motion and path-driven locomotion using the same model architecture. Finally, we extend the guidance procedure to perform style interpolation in a manner that is appealing for synthesis tasks and has connections to product-of-experts models, a contribution we believe is of independent interest. Video examples are available at https://www.speech.kth.se/research/listen-denoise-action/

* 15 pages, 6 figures

View paper on

Share this with someone who'll enjoy it:

Title:Listen, denoise, action! Audio-driven motion synthesis with diffusion models

Paper and Code