Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiqian Yu

AR4D: Autoregressive 4D Generation from Monocular Videos

Jan 03, 2025

Hanxin Zhu, Tianyu He, Xiqian Yu, Junliang Guo, Zhibo Chen, Jiang Bian

Figure 1 for AR4D: Autoregressive 4D Generation from Monocular Videos

Figure 2 for AR4D: Autoregressive 4D Generation from Monocular Videos

Figure 3 for AR4D: Autoregressive 4D Generation from Monocular Videos

Figure 4 for AR4D: Autoregressive 4D Generation from Monocular Videos

Abstract:Recent advancements in generative models have ignited substantial interest in dynamic 3D content creation (\ie, 4D generation). Existing approaches primarily rely on Score Distillation Sampling (SDS) to infer novel-view videos, typically leading to issues such as limited diversity, spatial-temporal inconsistency and poor prompt alignment, due to the inherent randomness of SDS. To tackle these problems, we propose AR4D, a novel paradigm for SDS-free 4D generation. Specifically, our paradigm consists of three stages. To begin with, for a monocular video that is either generated or captured, we first utilize pre-trained expert models to create a 3D representation of the first frame, which is further fine-tuned to serve as the canonical space. Subsequently, motivated by the fact that videos happen naturally in an autoregressive manner, we propose to generate each frame's 3D representation based on its previous frame's representation, as this autoregressive generation manner can facilitate more accurate geometry and motion estimation. Meanwhile, to prevent overfitting during this process, we introduce a progressive view sampling strategy, utilizing priors from pre-trained large-scale 3D reconstruction models. To avoid appearance drift introduced by autoregressive generation, we further incorporate a refinement stage based on a global deformation field and the geometry of each frame's 3D representation. Extensive experiments have demonstrated that AR4D can achieve state-of-the-art 4D generation without SDS, delivering greater diversity, improved spatial-temporal consistency, and better alignment with input prompts.

* TL;DR: We present a novel method for 4D generation from monocular videos without relying on SDS, delivering greater diversity, improved spatial-temporal consistency, and better alignment with input prompts. Project page: https://hanxinzhu-lab.github.io/AR4D/

Via

Access Paper or Ask Questions

GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors

Jun 14, 2024

Xiqian Yu, Hanxin Zhu, Tianyu He, Zhibo Chen

Figure 1 for GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors

Figure 2 for GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors

Figure 3 for GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors

Figure 4 for GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors

Abstract:Achieving high-resolution novel view synthesis (HRNVS) from low-resolution input views is a challenging task due to the lack of high-resolution data. Previous methods optimize high-resolution Neural Radiance Field (NeRF) from low-resolution input views but suffer from slow rendering speed. In this work, we base our method on 3D Gaussian Splatting (3DGS) due to its capability of producing high-quality images at a faster rendering speed. To alleviate the shortage of data for higher-resolution synthesis, we propose to leverage off-the-shelf 2D diffusion priors by distilling the 2D knowledge into 3D with Score Distillation Sampling (SDS). Nevertheless, applying SDS directly to Gaussian-based 3D super-resolution leads to undesirable and redundant 3D Gaussian primitives, due to the randomness brought by generative priors. To mitigate this issue, we introduce two simple yet effective techniques to reduce stochastic disturbances introduced by SDS. Specifically, we 1) shrink the range of diffusion timestep in SDS with an annealing strategy; 2) randomly discard redundant Gaussian primitives during densification. Extensive experiments have demonstrated that our proposed GaussainSR can attain high-quality results for HRNVS with only low-resolution inputs on both synthetic and real-world datasets. Project page: https://chchnii.github.io/GaussianSR/

Via

Access Paper or Ask Questions