NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space

Monocular 3D Semantic Scene Completion (SSC) has garnered significantattention in recent years due to its potential to predict complex semantics andgeometry shapes from a single image, requiring no 3D inputs. In this paper, weidentify several critical issues in current state-of-the-art methods, includingthe Feature Ambiguity of projected 2D features in the ray to the 3D space, thePose Ambiguity of the 3D convolution, and the Computation Imbalance in the 3Dconvolution across different depth levels. To address these problems, we devisea novel Normalized Device Coordinates scene completion network (NDC-Scene) thatdirectly extends the 2D feature map to a Normalized Device Coordinates (NDC)space, rather than to the world space directly, through progressive restorationof the dimension of depth with deconvolution operations. Experiment resultsdemonstrate that transferring the majority of computation from the target 3Dspace to the proposed normalized device coordinates space benefits monocularSSC tasks. Additionally, we design a Depth-Adaptive Dual Decoder tosimultaneously upsample and fuse the 2D and 3D feature maps, further improvingoverall performance. Our extensive experiments confirm that the proposed methodconsistently outperforms state-of-the-art methods on both outdoor SemanticKITTIand indoor NYUv2 datasets. Our code are available athttps://github.com/Jiawei-Yao0812/NDCScene.