Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

Sep 05, 2024

Yunze Man, Shuhong Zheng, Zhipeng Bao, Martial Hebert, Liang-Yan Gui, Yu-Xiong Wang

Figure 1 for Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

Figure 2 for Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

Figure 3 for Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

Figure 4 for Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

Share this with someone who'll enjoy it:

Abstract:Complex 3D scene understanding has gained increasing attention, with scene encoding strategies playing a crucial role in this success. However, the optimal scene encoding strategies for various scenarios remain unclear, particularly compared to their image-based counterparts. To address this issue, we present a comprehensive study that probes various visual encoding models for 3D scene understanding, identifying the strengths and limitations of each model across different scenarios. Our evaluation spans seven vision foundation encoders, including image-based, video-based, and 3D foundation models. We evaluate these models in four tasks: Vision-Language Scene Reasoning, Visual Grounding, Segmentation, and Registration, each focusing on different aspects of scene understanding. Our evaluations yield key findings: DINOv2 demonstrates superior performance, video models excel in object-level tasks, diffusion models benefit geometric tasks, and language-pretrained models show unexpected limitations in language-related tasks. These insights challenge some conventional understandings, provide novel perspectives on leveraging visual foundation models, and highlight the need for more flexible encoder selection in future vision-language and scene-understanding tasks.

* Project page: https://yunzeman.github.io/lexicon3d , Github: https://github.com/YunzeMan/Lexicon3D

View paper on

Share this with someone who'll enjoy it:

Title:Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

Paper and Code