Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM

Apr 08, 2025

Jirong Zha, Yuxuan Fan, Xiao Yang, Chen Gao, Xinlei Chen

Share this with someone who'll enjoy it:

Abstract:3D spatial understanding is essential in real-world applications such as robotics, autonomous vehicles, virtual reality, and medical imaging. Recently, Large Language Models (LLMs), having demonstrated remarkable success across various domains, have been leveraged to enhance 3D understanding tasks, showing potential to surpass traditional computer vision methods. In this survey, we present a comprehensive review of methods integrating LLMs with 3D spatial understanding. We propose a taxonomy that categorizes existing methods into three branches: image-based methods deriving 3D understanding from 2D visual data, point cloud-based methods working directly with 3D representations, and hybrid modality-based methods combining multiple data streams. We systematically review representative methods along these categories, covering data representations, architectural modifications, and training strategies that bridge textual and 3D modalities. Finally, we discuss current limitations, including dataset scarcity and computational challenges, while highlighting promising research directions in spatial perception, multi-modal fusion, and real-world applications.

* 9 pages, 5 figures

View paper on

Share this with someone who'll enjoy it:

Title:How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM

Paper and Code