8 months ago

Computer Vision

3D Machine Vision

Semantic Segmentation

Computer Vision

Yunpeng Zhang Zheng Zhu* Dalong Du

Abstract

The vision-based perception for autonomous driving has undergone atransformation from the bird-eye-view (BEV) representations to the 3D semanticoccupancy. Compared with the BEV planes, the 3D semantic occupancy furtherprovides structural information along the vertical direction. This paperpresents OccFormer, a dual-path transformer network to effectively process the3D volume for semantic occupancy prediction. OccFormer achieves a long-range,dynamic, and efficient encoding of the camera-generated 3D voxel features. Itis obtained by decomposing the heavy 3D processing into the local and globaltransformer pathways along the horizontal plane. For the occupancy decoder, weadapt the vanilla Mask2Former for 3D semantic occupancy by proposingpreserve-pooling and class-guided sampling, which notably mitigate the sparsityand class imbalance. Experimental results demonstrate that OccFormersignificantly outperforms existing methods for semantic scene completion onSemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset.Code is available at \url{https://github.com/zhangyp15/OccFormer}.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Computer Vision

3D Machine Vision

Semantic Segmentation

Computer Vision

Yunpeng Zhang Zheng Zhu* Dalong Du

Abstract

The vision-based perception for autonomous driving has undergone atransformation from the bird-eye-view (BEV) representations to the 3D semanticoccupancy. Compared with the BEV planes, the 3D semantic occupancy furtherprovides structural information along the vertical direction. This paperpresents OccFormer, a dual-path transformer network to effectively process the3D volume for semantic occupancy prediction. OccFormer achieves a long-range,dynamic, and efficient encoding of the camera-generated 3D voxel features. Itis obtained by decomposing the heavy 3D processing into the local and globaltransformer pathways along the horizontal plane. For the occupancy decoder, weadapt the vanilla Mask2Former for 3D semantic occupancy by proposingpreserve-pooling and class-guided sampling, which notably mitigate the sparsityand class imbalance. Experimental results demonstrate that OccFormersignificantly outperforms existing methods for semantic scene completion onSemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset.Code is available at \url{https://github.com/zhangyp15/OccFormer}.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp