Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

May 28, 2024

Qilin Wang, Zhengkai Jiang, Chengming Xu, Jiangning Zhang, Yabiao Wang, Xinyi Zhang, Yun Cao, Weijian Cao, Chengjie Wang, Yanwei Fu

Figure 1 for VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

Figure 2 for VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

Figure 3 for VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

Figure 4 for VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

Share this with someone who'll enjoy it:

Abstract:Human image animation involves generating a video from a static image by following a specified pose sequence. Current approaches typically adopt a multi-stage pipeline that separately learns appearance and motion, which often leads to appearance degradation and temporal inconsistencies. To address these issues, we propose VividPose, an innovative end-to-end pipeline based on Stable Video Diffusion (SVD) that ensures superior temporal stability. To enhance the retention of human identity, we propose an identity-aware appearance controller that integrates additional facial information without compromising other appearance details such as clothing texture and background. This approach ensures that the generated videos maintain high fidelity to the identity of human subject, preserving key facial features across various poses. To accommodate diverse human body shapes and hand movements, we introduce a geometry-aware pose controller that utilizes both dense rendering maps from SMPL-X and sparse skeleton maps. This enables accurate alignment of pose and shape in the generated videos, providing a robust framework capable of handling a wide range of body shapes and dynamic hand movements. Extensive qualitative and quantitative experiments on the UBCFashion and TikTok benchmarks demonstrate that our method achieves state-of-the-art performance. Furthermore, VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset. Codes and models will be available.

View paper on

Share this with someone who'll enjoy it:

Title:VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

Paper and Code