Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xuyi Meng

Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation

Jan 09, 2025

Xuyi Meng, Chen Wang, Jiahui Lei, Kostas Daniilidis, Jiatao Gu, Lingjie Liu

Figure 1 for Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation

Figure 2 for Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation

Figure 3 for Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation

Figure 4 for Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation

Abstract:Recent advances in 2D image generation have achieved remarkable quality,largely driven by the capacity of diffusion models and the availability of large-scale datasets. However, direct 3D generation is still constrained by the scarcity and lower fidelity of 3D datasets. In this paper, we introduce Zero-1-to-G, a novel approach that addresses this problem by enabling direct single-view generation on Gaussian splats using pretrained 2D diffusion models. Our key insight is that Gaussian splats, a 3D representation, can be decomposed into multi-view images encoding different attributes. This reframes the challenging task of direct 3D generation within a 2D diffusion framework, allowing us to leverage the rich priors of pretrained 2D diffusion models. To incorporate 3D awareness, we introduce cross-view and cross-attribute attention layers, which capture complex correlations and enforce 3D consistency across generated splats. This makes Zero-1-to-G the first direct image-to-3D generative model to effectively utilize pretrained 2D diffusion priors, enabling efficient training and improved generalization to unseen objects. Extensive experiments on both synthetic and in-the-wild datasets demonstrate superior performance in 3D object generation, offering a new approach to high-quality 3D generation.

Via

Access Paper or Ask Questions

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Mar 18, 2024

Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, Chen Change Loy

Figure 1 for LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Figure 2 for LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Figure 3 for LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Figure 4 for LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Abstract:The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harnesses a 3D-aware architecture and variational autoencoder (VAE) to encode the input image into a structured, compact, and 3D latent space. The latent is decoded by a transformer-based decoder into a high-capacity 3D neural field. Through training a diffusion model on this 3D-aware latent space, our method achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation across various datasets. Moreover, it surpasses existing 3D diffusion methods in terms of inference speed, requiring no per-instance optimization. Our proposed LN3Diff presents a significant advancement in 3D generative modeling and holds promise for various applications in 3D vision and graphics tasks.

* project webpage: https://nirvanalan.github.io/projects/ln3diff/

Via

Access Paper or Ask Questions

Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion

Dec 15, 2022

Yushi Lan, Xuyi Meng, Shuai Yang, Chen Change Loy, Bo Dai

Figure 1 for Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion

Figure 2 for Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion

Figure 3 for Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion

Figure 4 for Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion

Abstract:StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing. While studies over extending 2D StyleGAN to 3D faces have emerged, a corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing. In this paper, we study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures. The problem is ill-posed: innumerable compositions of shape and texture could be rendered to the current image. Furthermore, with the limited capacity of a global latent code, 2D inversion methods cannot preserve faithful shape and texture at the same time when applied to 3D models. To solve this problem, we devise an effective self-training scheme to constrain the learning of inversion. The learning is done efficiently without any real-world 2D-3D training pairs but proxy samples generated from a 3D GAN. In addition, apart from a global latent code that captures the coarse shape and texture information, we augment the generation network with a local branch, where pixel-aligned features are added to faithfully reconstruct face details. We further consider a new pipeline to perform 3D view-consistent editing. Extensive experiments show that our method outperforms state-of-the-art inversion methods in both shape and texture reconstruction quality. Code and data will be released.

* An encoder-based 3D GAN inversion method. Project page: https://nirvanalan.github.io/projects/E3DGE/index.html

Via

Access Paper or Ask Questions