HyperAIHyperAI
2 months ago

Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion

Li, Bohan ; Deng, Jiajun ; Zhang, Wenyao ; Liang, Zhujin ; Du, Dalong ; Jin, Xin ; Zeng, Wenjun
Hierarchical Temporal Context Learning for Camera-based Semantic Scene
  Completion
Abstract

Camera-based 3D semantic scene completion (SSC) is pivotal for predictingcomplicated 3D layouts with limited 2D image observations. The existingmainstream solutions generally leverage temporal information by roughlystacking history frames to supplement the current frame, such straightforwardtemporal modeling inevitably diminishes valid clues and increases learningdifficulty. To address this problem, we present HTCL, a novel HierarchicalTemporal Context Learning paradigm for improving camera-based semantic scenecompletion. The primary innovation of this work involves decomposing temporalcontext learning into two hierarchical steps: (a) cross-frame affinitymeasurement and (b) affinity-based dynamic refinement. Firstly, to separatecritical relevant context from redundant information, we introduce the patternaffinity with scale-aware isolation and multiple independent learners forfine-grained contextual correspondence modeling. Subsequently, to dynamicallycompensate for incomplete observations, we adaptively refine the featuresampling locations based on initially identified locations with high affinityand their neighboring relevant regions. Our method ranks $1^{st}$ on theSemanticKITTI benchmark and even surpasses LiDAR-based methods in terms of mIoUon the OpenOccupancy benchmark. Our code is available onhttps://github.com/Arlo0o/HTCL.

Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion | Latest Papers | HyperAI