Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

May 26, 2024

Hanwen Liang, Yuyang Yin, Dejia Xu, Hanxue Liang, Zhangyang Wang, Konstantinos N. Plataniotis, Yao Zhao, Yunchao Wei

Figure 1 for Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

Figure 2 for Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

Figure 3 for Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

Figure 4 for Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

Share this with someone who'll enjoy it:

Abstract:The availability of large-scale multimodal datasets and advancements in diffusion models have significantly accelerated progress in 4D content generation. Most prior approaches rely on multiple image or video diffusion models, utilizing score distillation sampling for optimization or generating pseudo novel views for direct supervision. However, these methods are hindered by slow optimization speeds and multi-view inconsistency issues. Spatial and temporal consistency in 4D geometry has been extensively explored respectively in 3D-aware diffusion models and traditional monocular video diffusion models. Building on this foundation, we propose a strategy to migrate the temporal consistency in video diffusion models to the spatial-temporal consistency required for 4D generation. Specifically, we present a novel framework, \textbf{Diffusion4D}, for efficient and scalable 4D content generation. Leveraging a meticulously curated dynamic 3D dataset, we develop a 4D-aware video diffusion model capable of synthesizing orbital views of dynamic 3D assets. To control the dynamic strength of these assets, we introduce a 3D-to-4D motion magnitude metric as guidance. Additionally, we propose a novel motion magnitude reconstruction loss and 3D-aware classifier-free guidance to refine the learning and generation of motion dynamics. After obtaining orbital views of the 4D asset, we perform explicit 4D construction with Gaussian splatting in a coarse-to-fine manner. The synthesized multi-view consistent 4D image set enables us to swiftly generate high-fidelity and diverse 4D assets within just several minutes. Extensive experiments demonstrate that our method surpasses prior state-of-the-art techniques in terms of generation efficiency and 4D geometry consistency across various prompt modalities.

* Project page: https://vita-group.github.io/Diffusion4D

View paper on

Share this with someone who'll enjoy it:

Title:Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

Paper and Code