HyperAIHyperAI
2 months ago

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

Zhang, Yunpeng ; Zhu, Zheng ; Du, Dalong
OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy
  Prediction
Abstract

The vision-based perception for autonomous driving has undergone atransformation from the bird-eye-view (BEV) representations to the 3D semanticoccupancy. Compared with the BEV planes, the 3D semantic occupancy furtherprovides structural information along the vertical direction. This paperpresents OccFormer, a dual-path transformer network to effectively process the3D volume for semantic occupancy prediction. OccFormer achieves a long-range,dynamic, and efficient encoding of the camera-generated 3D voxel features. Itis obtained by decomposing the heavy 3D processing into the local and globaltransformer pathways along the horizontal plane. For the occupancy decoder, weadapt the vanilla Mask2Former for 3D semantic occupancy by proposingpreserve-pooling and class-guided sampling, which notably mitigate the sparsityand class imbalance. Experimental results demonstrate that OccFormersignificantly outperforms existing methods for semantic scene completion onSemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset.Code is available at \url{https://github.com/zhangyp15/OccFormer}.

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction | Latest Papers | HyperAI