Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer

May 24, 2024

Zichen Geng, Caren Han, Zeeshan Hayder, Jian Liu, Mubarak Shah, Ajmal Mian

Figure 1 for Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer

Figure 2 for Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer

Figure 3 for Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer

Figure 4 for Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer

Share this with someone who'll enjoy it:

Abstract:Text-driven human motion generation is an emerging task in animation and humanoid robot design. Existing algorithms directly generate the full sequence which is computationally expensive and prone to errors as it does not pay special attention to key poses, a process that has been the cornerstone of animation for decades. We propose KeyMotion, that generates plausible human motion sequences corresponding to input text by first generating keyframes followed by in-filling. We use a Variational Autoencoder (VAE) with Kullback-Leibler regularization to project the keyframes into a latent space to reduce dimensionality and further accelerate the subsequent diffusion process. For the reverse diffusion, we propose a novel Parallel Skip Transformer that performs cross-modal attention between the keyframe latents and text condition. To complete the motion sequence, we propose a text-guided Transformer designed to perform motion-in-filling, ensuring the preservation of both fidelity and adherence to the physical constraints of human motion. Experiments show that our method achieves state-of-theart results on the HumanML3D dataset outperforming others on all R-precision metrics and MultiModal Distance. KeyMotion also achieves competitive performance on the KIT dataset, achieving the best results on Top3 R-precision, FID, and Diversity metrics.

View paper on

Share this with someone who'll enjoy it:

Title:Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer

Paper and Code