Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shengjun Zhang

Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model

Apr 03, 2025

Shengjun Zhang, Jinzhao Li, Xin Fei, Hao Liu, Yueqi Duan

Abstract:In this paper, we propose Scene Splatter, a momentum-based paradigm for video diffusion to generate generic scenes from single image. Existing methods, which employ video generation models to synthesize novel views, suffer from limited video length and scene inconsistency, leading to artifacts and distortions during further reconstruction. To address this issue, we construct noisy samples from original features as momentum to enhance video details and maintain scene consistency. However, for latent features with the perception field that spans both known and unknown regions, such latent-level momentum restricts the generative ability of video diffusion in unknown regions. Therefore, we further introduce the aforementioned consistent video as a pixel-level momentum to a directly generated video without momentum for better recovery of unseen regions. Our cascaded momentum enables video diffusion models to generate both high-fidelity and consistent novel views. We further finetune the global Gaussian representations with enhanced frames and render new frames for momentum update in the next step. In this manner, we can iteratively recover a 3D scene, avoiding the limitation of video length. Extensive experiments demonstrate the generalization capability and superior performance of our method in high-fidelity and consistent scene generation.

* CVPR 2025

Via

Access Paper or Ask Questions

Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images

Mar 20, 2025

Shengjun Zhang, Xin Fei, Fangfu Liu, Haixu Song, Yueqi Duan

Abstract:3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis performance. While conventional methods require per-scene optimization, more recently several feed-forward methods have been proposed to generate pixel-aligned Gaussian representations with a learnable network, which are generalizable to different scenes. However, these methods simply combine pixel-aligned Gaussians from multiple views as scene representations, thereby leading to artifacts and extra memory cost without fully capturing the relations of Gaussians from different images. In this paper, we propose Gaussian Graph Network (GGN) to generate efficient and generalizable Gaussian representations. Specifically, we construct Gaussian Graphs to model the relations of Gaussian groups from different views. To support message passing at Gaussian level, we reformulate the basic graph operations over Gaussian representations, enabling each Gaussian to benefit from its connected Gaussian groups with Gaussian feature fusion. Furthermore, we design a Gaussian pooling layer to aggregate various Gaussian groups for efficient representations. We conduct experiments on the large-scale RealEstate10K and ACID datasets to demonstrate the efficiency and generalization of our method. Compared to the state-of-the-art methods, our model uses fewer Gaussians and achieves better image quality with higher rendering speed.

* NeurIPS 2024

Via

Access Paper or Ask Questions

Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Jun 07, 2024

Fangfu Liu, Hanyang Wang, Shunyu Yao, Shengjun Zhang, Jie Zhou, Yueqi Duan

Abstract:In recent years, there has been rapid development in 3D generation models, opening up new possibilities for applications such as simulating the dynamic movements of 3D objects and customizing their behaviors. However, current 3D generative models tend to focus only on surface features such as color and shape, neglecting the inherent physical properties that govern the behavior of objects in the real world. To accurately simulate physics-aligned dynamics, it is essential to predict the physical properties of materials and incorporate them into the behavior prediction process. Nonetheless, predicting the diverse materials of real-world objects is still challenging due to the complex nature of their physical attributes. In this paper, we propose \textbf{Physics3D}, a novel method for learning various physical properties of 3D objects through a video diffusion model. Our approach involves designing a highly generalizable physical simulation system based on a viscoelastic material model, which enables us to simulate a wide range of materials with high-fidelity capabilities. Moreover, we distill the physical priors from a video diffusion model that contains more understanding of realistic object materials. Extensive experiments demonstrate the effectiveness of our method with both elastic and plastic materials. Physics3D shows great potential for bridging the gap between the physical world and virtual neural space, providing a better integration and application of realistic physical principles in virtual environments. Project page: https://liuff19.github.io/Physics3D.

* Project page: https://liuff19.github.io/Physics3D; corrected author list

Via

Access Paper or Ask Questions

GeoAuxNet: Towards Universal 3D Representation Learning for Multi-sensor Point Clouds

Mar 28, 2024

Shengjun Zhang, Xin Fei, Yueqi Duan

Abstract:Point clouds captured by different sensors such as RGB-D cameras and LiDAR possess non-negligible domain gaps. Most existing methods design different network architectures and train separately on point clouds from various sensors. Typically, point-based methods achieve outstanding performances on even-distributed dense point clouds from RGB-D cameras, while voxel-based methods are more efficient for large-range sparse LiDAR point clouds. In this paper, we propose geometry-to-voxel auxiliary learning to enable voxel representations to access point-level geometric information, which supports better generalisation of the voxel-based backbone with additional interpretations of multi-sensor point clouds. Specifically, we construct hierarchical geometry pools generated by a voxel-guided dynamic point network, which efficiently provide auxiliary fine-grained geometric information adapted to different stages of voxel features. We conduct experiments on joint multi-sensor datasets to demonstrate the effectiveness of GeoAuxNet. Enjoying elaborate geometric information, our method outperforms other models collectively trained on multi-sensor datasets, and achieve competitive results with the-state-of-art experts on each single dataset.

* CVPR 2024

Via

Access Paper or Ask Questions

Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order Coordinate Method

Mar 24, 2021

Shengjun Zhang, Yunlong Dong, Dong Xie, Lisha Yao, Colleen P. Bailey, Shengli Fu

Figure 1 for Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order Coordinate Method

Figure 2 for Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order Coordinate Method

Figure 3 for Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order Coordinate Method

Abstract:This paper investigates the stochastic distributed nonconvex optimization problem of minimizing a global cost function formed by the summation of $n$ local cost functions. We solve such a problem by involving zeroth-order (ZO) information exchange. In this paper, we propose a ZO distributed primal-dual coordinate method (ZODIAC) to solve the stochastic optimization problem. Agents approximate their own local stochastic ZO oracle along with coordinates with an adaptive smoothing parameter. We show that the proposed algorithm achieves the convergence rate of $\mathcal{O}(\sqrt{p}/\sqrt{T})$ for general nonconvex cost functions. We demonstrate the efficiency of proposed algorithms through a numerical example in comparison with the existing state-of-the-art centralized and distributed ZO algorithms.

Via

Access Paper or Ask Questions

Extremal Region Analysis based Deep Learning Framework for Detecting Defects

Mar 19, 2020

Zelin Deng, Xiaolong Yan, Shengjun Zhang, Colleen P. Bailey

Figure 1 for Extremal Region Analysis based Deep Learning Framework for Detecting Defects

Figure 2 for Extremal Region Analysis based Deep Learning Framework for Detecting Defects

Figure 3 for Extremal Region Analysis based Deep Learning Framework for Detecting Defects

Figure 4 for Extremal Region Analysis based Deep Learning Framework for Detecting Defects

Abstract:A maximally stable extreme region (MSER) analysis based convolutional neural network (CNN) for unified defect detection framework is proposed in this paper. Our proposed framework utilizes the generality and stability of MSER to generate the desired defect candidates. Then a specific trained binary CNN classifier is adopted over the defect candidates to produce the final defect set. Defect datasets over different categories \blue{are used} in the experiments. More generally, the parameter settings in MSER can be adjusted to satisfy different requirements in various industries (high precision, high recall, etc). Extensive experimental results have shown the efficacy of the proposed framework.

Via

Access Paper or Ask Questions