Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models

May 16, 2023

Zhimin Chen, Bing Li

Figure 1 for Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models

Figure 2 for Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models

Figure 3 for Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models

Figure 4 for Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models

Share this with someone who'll enjoy it:

Abstract:Foundation models have made significant strides in 2D and language tasks such as image segmentation, object detection, and visual-language understanding. Nevertheless, their potential to enhance 3D scene representation learning remains largely untapped due to the domain gap. In this paper, we propose an innovative methodology Bridge3D to address this gap, pre-training 3D models using features, semantic masks, and captions sourced from foundation models. Specifically, our approach utilizes semantic masks from these models to guide the masking and reconstruction process in the masked autoencoder. This strategy enables the network to concentrate more on foreground objects, thereby enhancing 3D representation learning. Additionally, we bridge the 3D-text gap at the scene level by harnessing image captioning foundation models. To further facilitate knowledge distillation from well-learned 2D and text representations to the 3D model, we introduce a novel method that employs foundation models to generate highly accurate object-level masks and semantic text information at the object level. Our approach notably outshines state-of-the-art methods in 3D object detection and semantic segmentation tasks. For instance, on the ScanNet dataset, our method surpasses the previous state-of-the-art method, PiMAE, by a significant margin of 5.3%.

View paper on

Share this with someone who'll enjoy it:

Title:Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models

Paper and Code