8 months ago

Abstract

Understanding the world in 3D is a critical component of urban autonomousdriving. Generally, the combination of expensive LiDAR sensors and stereo RGBimaging has been paramount for successful 3D object detection algorithms,whereas monocular image-only methods experience drastically reducedperformance. We propose to reduce the gap by reformulating the monocular 3Ddetection problem as a standalone 3D region proposal network. We leverage thegeometric relationship of 2D and 3D perspectives, allowing 3D boxes to utilizewell-known and powerful convolutional features generated in the image-space. Tohelp address the strenuous 3D parameter estimations, we further designdepth-aware convolutional layers which enable location specific featuredevelopment and in consequence improved 3D scene understanding. Compared toprior work in monocular 3D detection, our method consists of only the proposed3D region proposal network rather than relying on external networks, data, ormultiple stages. M3D-RPN is able to significantly improve the performance ofboth monocular 3D Object Detection and Bird's Eye View tasks within the KITTIurban autonomous driving dataset, while efficiently using a shared multi-classmodel.

Source PDF