Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Binbin Huang

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Mar 27, 2025

Sibo Wu, Congrong Xu, Binbin Huang, Andreas Geiger, Anpei Chen

Abstract:Recently, 3D reconstruction and generation have demonstrated impressive novel view synthesis results, achieving high fidelity and efficiency. However, a notable conditioning gap can be observed between these two fields, e.g., scalable 3D scene reconstruction often requires densely captured views, whereas 3D generation typically relies on a single or no input view, which significantly limits their applications. We found that the source of this phenomenon lies in the misalignment between 3D constraints and generative priors. To address this problem, we propose a reconstruction-driven video diffusion model that learns to condition video frames on artifact-prone RGB-D renderings. Moreover, we propose a cyclical fusion pipeline that iteratively adds restoration frames from the generative model to the training set, enabling progressive expansion and addressing the viewpoint saturation limitations seen in previous reconstruction and generation pipelines. Our evaluation, including view synthesis from sparse view and masked input, validates the effectiveness of our approach.

Via

Access Paper or Ask Questions

Quadratic Gaussian Splatting for Efficient and Detailed Surface Reconstruction

Nov 25, 2024

Ziyu Zhang, Binbin Huang, Hanqing Jiang, Liyang Zhou, Xiaojun Xiang, Shunhan Shen

Figure 1 for Quadratic Gaussian Splatting for Efficient and Detailed Surface Reconstruction

Figure 2 for Quadratic Gaussian Splatting for Efficient and Detailed Surface Reconstruction

Figure 3 for Quadratic Gaussian Splatting for Efficient and Detailed Surface Reconstruction

Figure 4 for Quadratic Gaussian Splatting for Efficient and Detailed Surface Reconstruction

Abstract:Recently, 3D Gaussian Splatting (3DGS) has attracted attention for its superior rendering quality and speed over Neural Radiance Fields (NeRF). To address 3DGS's limitations in surface representation, 2D Gaussian Splatting (2DGS) introduced disks as scene primitives to model and reconstruct geometries from multi-view images, offering view-consistent geometry. However, the disk's first-order linear approximation often leads to over-smoothed results. We propose Quadratic Gaussian Splatting (QGS), a novel method that replaces disks with quadric surfaces, enhancing geometric fitting, whose code will be open-sourced. QGS defines Gaussian distributions in non-Euclidean space, allowing primitives to capture more complex textures. As a second-order surface approximation, QGS also renders spatial curvature to guide the normal consistency term, to effectively reduce over-smoothing. Moreover, QGS is a generalized version of 2DGS that achieves more accurate and detailed reconstructions, as verified by experiments on DTU and TNT, demonstrating its effectiveness in surpassing current state-of-the-art methods in geometry reconstruction. Our code willbe released as open source.

Via

Access Paper or Ask Questions

Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes

Sep 27, 2024

Shuo Wang, Binbin Huang, Ruoyu Wang, Shenghua Gao

Figure 1 for Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes

Figure 2 for Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes

Figure 3 for Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes

Figure 4 for Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes

Abstract:Previous surface reconstruction methods either suffer from low geometric accuracy or lengthy training times when dealing with real-world complex dynamic scenes involving multi-person activities, and human-object interactions. To tackle the dynamic contents and the occlusions in complex scenes, we present a space-time 2D Gaussian Splatting approach. Specifically, to improve geometric quality in dynamic scenes, we learn canonical 2D Gaussian splats and deform these 2D Gaussian splats while enforcing the disks of the Gaussian located on the surface of the objects by introducing depth and normal regularizers. Further, to tackle the occlusion issues in complex scenes, we introduce a compositional opacity deformation strategy, which further reduces the surface recovery of those occluded areas. Experiments on real-world sparse-view video datasets and monocular dynamic datasets demonstrate that our reconstructions outperform state-of-the-art methods, especially for the surface of the details. The project page and more visualizations can be found at: https://tb2-sy.github.io/st-2dgs/.

* Project page: https://tb2-sy.github.io/st-2dgs/

Via

Access Paper or Ask Questions

GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer

Aug 13, 2024

Jinpeng Yu, Binbin Huang, Yuxuan Zhang, Huaxia Li, Xu Tang, Shenghua Gao

Figure 1 for GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer

Figure 2 for GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer

Figure 3 for GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer

Figure 4 for GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer

Abstract:Point cloud completion aims to recover accurate global geometry and preserve fine-grained local details from partial point clouds. Conventional methods typically predict unseen points directly from 3D point cloud coordinates or use self-projected multi-view depth maps to ease this task. However, these gray-scale depth maps cannot reach multi-view consistency, consequently restricting the performance. In this paper, we introduce a GeoFormer that simultaneously enhances the global geometric structure of the points and improves the local details. Specifically, we design a CCM Feature Enhanced Point Generator to integrate image features from multi-view consistent canonical coordinate maps (CCMs) and align them with pure point features, thereby enhancing the global geometry feature. Additionally, we employ the Multi-scale Geometry-aware Upsampler module to progressively enhance local details. This is achieved through cross attention between the multi-scale features extracted from the partial input and the features derived from previously estimated points. Extensive experiments on the PCN, ShapeNet-55/34, and KITTI benchmarks demonstrate that our GeoFormer outperforms recent methods, achieving the state-of-the-art performance. Our code is available at \href{https://github.com/Jinpeng-Yu/GeoFormer}{https://github.com/Jinpeng-Yu/GeoFormer}.

* accepted by the 32nd ACM International Conference on Multimedia (MM'24)

Via

Access Paper or Ask Questions

Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

Jul 23, 2024

Yiqun Zhao, Chenming Wu, Binbin Huang, Yihao Zhi, Chen Zhao, Jingdong Wang, Shenghua Gao

Figure 1 for Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

Figure 2 for Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

Figure 3 for Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

Figure 4 for Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

Abstract:Efficient and accurate reconstruction of a relightable, dynamic clothed human avatar from a monocular video is crucial for the entertainment industry. This paper introduces the Surfel-based Gaussian Inverse Avatar (SGIA) method, which introduces efficient training and rendering for relightable dynamic human reconstruction. SGIA advances previous Gaussian Avatar methods by comprehensively modeling Physically-Based Rendering (PBR) properties for clothed human avatars, allowing for the manipulation of avatars into novel poses under diverse lighting conditions. Specifically, our approach integrates pre-integration and image-based lighting for fast light calculations that surpass the performance of existing implicit-based techniques. To address challenges related to material lighting disentanglement and accurate geometry reconstruction, we propose an innovative occlusion approximation strategy and a progressive training approach. Extensive experiments demonstrate that SGIA not only achieves highly accurate physical properties but also significantly enhances the realistic relighting of dynamic human avatars, providing a substantial speed advantage. We exhibit more results in our project page: https://GS-IA.github.io.

* Under Review; Project Page: https://GS-IA.github.io

Via

Access Paper or Ask Questions

Continual Learning for Temporal-Sensitive Question Answering

Jul 17, 2024

Wanqi Yang, Yunqiu Xu, Yanda Li, Kunze Wang, Binbin Huang, Ling Chen

Figure 1 for Continual Learning for Temporal-Sensitive Question Answering

Figure 2 for Continual Learning for Temporal-Sensitive Question Answering

Figure 3 for Continual Learning for Temporal-Sensitive Question Answering

Figure 4 for Continual Learning for Temporal-Sensitive Question Answering

Abstract:In this study, we explore an emerging research area of Continual Learning for Temporal Sensitive Question Answering (CLTSQA). Previous research has primarily focused on Temporal Sensitive Question Answering (TSQA), often overlooking the unpredictable nature of future events. In real-world applications, it's crucial for models to continually acquire knowledge over time, rather than relying on a static, complete dataset. Our paper investigates strategies that enable models to adapt to the ever-evolving information landscape, thereby addressing the challenges inherent in CLTSQA. To support our research, we first create a novel dataset, divided into five subsets, designed specifically for various stages of continual learning. We then propose a training framework for CLTSQA that integrates temporal memory replay and temporal contrastive learning. Our experimental results highlight two significant insights: First, the CLTSQA task introduces unique challenges for existing models. Second, our proposed framework effectively navigates these challenges, resulting in improved performance.

* Accepted by IJCNN 2024

Via

Access Paper or Ask Questions

2D Gaussian Splatting for Geometrically Accurate Radiance Fields

Mar 26, 2024

Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, Shenghua Gao

Figure 1 for 2D Gaussian Splatting for Geometrically Accurate Radiance Fields

Figure 2 for 2D Gaussian Splatting for Geometrically Accurate Radiance Fields

Figure 3 for 2D Gaussian Splatting for Geometrically Accurate Radiance Fields

Figure 4 for 2D Gaussian Splatting for Geometrically Accurate Radiance Fields

Abstract:3D Gaussian Splatting (3DGS) has recently revolutionized radiance field reconstruction, achieving high quality novel view synthesis and fast rendering speed without baking. However, 3DGS fails to accurately represent surfaces due to the multi-view inconsistent nature of 3D Gaussians. We present 2D Gaussian Splatting (2DGS), a novel approach to model and reconstruct geometrically accurate radiance fields from multi-view images. Our key idea is to collapse the 3D volume into a set of 2D oriented planar Gaussian disks. Unlike 3D Gaussians, 2D Gaussians provide view-consistent geometry while modeling surfaces intrinsically. To accurately recover thin surfaces and achieve stable optimization, we introduce a perspective-accurate 2D splatting process utilizing ray-splat intersection and rasterization. Additionally, we incorporate depth distortion and normal consistency terms to further enhance the quality of the reconstructions. We demonstrate that our differentiable renderer allows for noise-free and detailed geometry reconstruction while maintaining competitive appearance quality, fast training speed, and real-time rendering. Our code will be made publicly available.

* 12 pages, 12 figures

Via

Access Paper or Ask Questions

Mip-Splatting: Alias-free 3D Gaussian Splatting

Nov 27, 2023

Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, Andreas Geiger

Abstract:Recently, 3D Gaussian Splatting has demonstrated impressive novel view synthesis results, reaching high fidelity and efficiency. However, strong artifacts can be observed when changing the sampling rate, \eg, by changing focal length or camera distance. We find that the source for this phenomenon can be attributed to the lack of 3D frequency constraints and the usage of a 2D dilation filter. To address this problem, we introduce a 3D smoothing filter which constrains the size of the 3D Gaussian primitives based on the maximal sampling frequency induced by the input views, eliminating high-frequency artifacts when zooming in. Moreover, replacing 2D dilation with a 2D Mip filter, which simulates a 2D box filter, effectively mitigates aliasing and dilation issues. Our evaluation, including scenarios such a training on single-scale images and testing on multiple scales, validates the effectiveness of our approach.

* Project page: https://niujinshuchong.github.io/mip-splatting/

Via

Access Paper or Ask Questions

TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding

Nov 06, 2023

Shuo Wang, Jing Li, Zibo Zhao, Dongze Lian, Binbin Huang, Xiaomei Wang, Zhengxin Li, Shenghua Gao

Abstract:Holistic scene understanding includes semantic segmentation, surface normal estimation, object boundary detection, depth estimation, etc. The key aspect of this problem is to learn representation effectively, as each subtask builds upon not only correlated but also distinct attributes. Inspired by visual-prompt tuning, we propose a Task-Specific Prompts Transformer, dubbed TSP-Transformer, for holistic scene understanding. It features a vanilla transformer in the early stage and tasks-specific prompts transformer encoder in the lateral stage, where tasks-specific prompts are augmented. By doing so, the transformer layer learns the generic information from the shared parts and is endowed with task-specific capacity. First, the tasks-specific prompts serve as induced priors for each task effectively. Moreover, the task-specific prompts can be seen as switches to favor task-specific representation learning for different tasks. Extensive experiments on NYUD-v2 and PASCAL-Context show that our method achieves state-of-the-art performance, validating the effectiveness of our method for holistic scene understanding. We also provide our code in the following link https://github.com/tb2-sy/TSP-Transformer.

* WACV 2024

Via

Access Paper or Ask Questions

Omni-Line-of-Sight Imaging for Holistic Shape Reconstruction

Apr 21, 2023

Binbin Huang, Xingyue Peng, Siyuan Shen, Suan Xia, Ruiqian Li, Yanhua Yu, Yuehan Wang, Shenghua Gao, Wenzheng Chen, Shiying Li(+1 more)

Figure 1 for Omni-Line-of-Sight Imaging for Holistic Shape Reconstruction

Figure 2 for Omni-Line-of-Sight Imaging for Holistic Shape Reconstruction

Figure 3 for Omni-Line-of-Sight Imaging for Holistic Shape Reconstruction

Figure 4 for Omni-Line-of-Sight Imaging for Holistic Shape Reconstruction

Abstract:We introduce Omni-LOS, a neural computational imaging method for conducting holistic shape reconstruction (HSR) of complex objects utilizing a Single-Photon Avalanche Diode (SPAD)-based time-of-flight sensor. As illustrated in Fig. 1, our method enables new capabilities to reconstruct near-$360^\circ$ surrounding geometry of an object from a single scan spot. In such a scenario, traditional line-of-sight (LOS) imaging methods only see the front part of the object and typically fail to recover the occluded back regions. Inspired by recent advances of non-line-of-sight (NLOS) imaging techniques which have demonstrated great power to reconstruct occluded objects, Omni-LOS marries LOS and NLOS together, leveraging their complementary advantages to jointly recover the holistic shape of the object from a single scan position. The core of our method is to put the object nearby diffuse walls and augment the LOS scan in the front view with the NLOS scans from the surrounding walls, which serve as virtual ``mirrors'' to trap lights toward the object. Instead of separately recovering the LOS and NLOS signals, we adopt an implicit neural network to represent the object, analogous to NeRF and NeTF. While transients are measured along straight rays in LOS but over the spherical wavefronts in NLOS, we derive differentiable ray propagation models to simultaneously model both types of transient measurements so that the NLOS reconstruction also takes into account the direct LOS measurements and vice versa. We further develop a proof-of-concept Omni-LOS hardware prototype for real-world validation. Comprehensive experiments on various wall settings demonstrate that Omni-LOS successfully resolves shape ambiguities caused by occlusions, achieves high-fidelity 3D scan quality, and manages to recover objects of various scales and complexity.

Via

Access Paper or Ask Questions