HyperAIHyperAI
4 months ago

Virtual Sparse Convolution for Multimodal 3D Object Detection

Wu, Hai ; Wen, Chenglu ; Shi, Shaoshuai ; Li, Xin ; Wang, Cheng
Virtual Sparse Convolution for Multimodal 3D Object Detection
Abstract

Recently, virtual/pseudo-point-based 3D object detection that seamlesslyfuses RGB images and LiDAR data by depth completion has gained great attention.However, virtual points generated from an image are very dense, introducing ahuge amount of redundant computation during detection. Meanwhile, noisesbrought by inaccurate depth completion significantly degrade detectionprecision. This paper proposes a fast yet effective backbone, termedVirConvNet, based on a new operator VirConv (Virtual Sparse Convolution), forvirtual-point-based 3D object detection. VirConv consists of two key designs:(1) StVD (Stochastic Voxel Discard) and (2) NRConv (Noise-Resistant SubmanifoldConvolution). StVD alleviates the computation problem by discarding largeamounts of nearby redundant voxels. NRConv tackles the noise problem byencoding voxel features in both 2D image and 3D LiDAR space. By integratingVirConv, we first develop an efficient pipeline VirConv-L based on an earlyfusion design. Then, we build a high-precision pipeline VirConv-T based on atransformed refinement scheme. Finally, we develop a semi-supervised pipelineVirConv-S based on a pseudo-label framework. On the KITTI car 3D detection testleaderboard, our VirConv-L achieves 85% AP with a fast running speed of 56ms.Our VirConv-T and VirConv-S attains a high-precision of 86.3% and 87.2% AP, andcurrently rank 2nd and 1st, respectively. The code is available athttps://github.com/hailanyi/VirConv.