Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding

Jan 03, 2024

Xingxing Zuo, Pouya Samangouei, Yunwen Zhou, Yan Di, Mingyang Li

Figure 1 for FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding

Figure 2 for FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding

Figure 3 for FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding

Figure 4 for FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding

Share this with someone who'll enjoy it:

Abstract:Precisely perceiving the geometric and semantic properties of real-world 3D objects is crucial for the continued evolution of augmented reality and robotic applications. To this end, we present \algfull{} (\algname{}), which incorporates vision-language embeddings of foundation models into 3D Gaussian Splatting (GS). The key contribution of this work is an efficient method to reconstruct and represent 3D vision-language models. This is achieved by distilling feature maps generated from image-based foundation models into those rendered from our 3D model. To ensure high-quality rendering and fast training, we introduce a novel scene representation by integrating strengths from both GS and multi-resolution hash encodings (MHE). Our effective training procedure also introduces a pixel alignment loss that makes the rendered feature distance of same semantic entities close, following the pixel-level semantic boundaries. Our results demonstrate remarkable multi-view semantic consistency, facilitating diverse downstream tasks, beating state-of-the-art methods by $\mathbf{10.2}$ percent on open-vocabulary language-based object detection, despite that we are $\mathbf{851\times}$ faster for inference. This research explores the intersection of vision, language, and 3D scene representation, paving the way for enhanced scene understanding in uncontrolled real-world environments. We plan to release the code upon paper acceptance.

* 19 pages, Project page coming soon

View paper on

Share this with someone who'll enjoy it:

Title:FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding

Paper and Code