Symphonize 3D Semantic Scene Completion with Contextual Instance Queries

`3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotalundertaking in autonomous driving, aiming to predict voxel occupancy withinvolumetric scenes. However, prevailing methodologies primarily focus onvoxel-wise feature aggregation, while neglecting instance semantics and scenecontext. In this paper, we present a novel paradigm termed Symphonies(Scene-from-Insts), that delves into the integration of instance queries toorchestrate 2D-to-3D reconstruction and 3D scene modeling. Leveraging ourproposed Serial Instance-Propagated Attentions, Symphonies dynamically encodesinstance-centric semantics, facilitating intricate interactions betweenimage-based and volumetric domains. Simultaneously, Symphonies enables holisticscene comprehension by capturing context through the efficient fusion ofinstance queries, alleviating geometric ambiguity such as occlusion andperspective errors through contextual scene reasoning. Experimental resultsdemonstrate that Symphonies achieves state-of-the-art performance onchallenging benchmarks SemanticKITTI and SSCBench-KITTI-360, yieldingremarkable mIoU scores of 15.04 and 18.58, respectively. These results showcasethe paradigm's promising advancements. The code is available athttps://github.com/hustvl/Symphonies.