SpatialLM: Training Large Language Models for Structured Indoor Modeling

Yongsen Mao, Junhao Zhong, Chuan Fang, Jia Zheng, Rui Tang, Hao Zhu, Ping Tan, Zihan Zhou

발행일: 6/10/2025

SpatialLM: Training Large Language Models for Structured Indoor Modeling

초록

SpatialLM is a large language model designed to process 3D point cloud dataand generate structured 3D scene understanding outputs. These outputs includearchitectural elements like walls, doors, windows, and oriented object boxeswith their semantic categories. Unlike previous methods which exploittask-specific network designs, our model adheres to the standard multimodal LLMarchitecture and is fine-tuned directly from open-source LLMs. To train SpatialLM, we collect a large-scale, high-quality synthetic datasetconsisting of the point clouds of 12,328 indoor scenes (54,778 rooms) withground-truth 3D annotations, and conduct a careful study on various modelingand training decisions. On public benchmarks, our model gives state-of-the-artperformance in layout estimation and competitive results in 3D objectdetection. With that, we show a feasible path for enhancing the spatialunderstanding capabilities of modern LLMs for applications in augmentedreality, embodied robotics, and more.

논문 세부 정보 보기 View Code