PointPillars: Fast Encoders for Object Detection from Point Clouds

Object detection in point clouds is an important aspect of many roboticsapplications such as autonomous driving. In this paper we consider the problemof encoding a point cloud into a format appropriate for a downstream detectionpipeline. Recent literature suggests two types of encoders; fixed encoders tendto be fast but sacrifice accuracy, while encoders that are learned from dataare more accurate, but slower. In this work we propose PointPillars, a novelencoder which utilizes PointNets to learn a representation of point cloudsorganized in vertical columns (pillars). While the encoded features can be usedwith any standard 2D convolutional detection architecture, we further propose alean downstream network. Extensive experimentation shows that PointPillarsoutperforms previous encoders with respect to both speed and accuracy by alarge margin. Despite only using lidar, our full detection pipelinesignificantly outperforms the state of the art, even among fusion methods, withrespect to both the 3D and bird's eye view KITTI benchmarks. This detectionperformance is achieved while running at 62 Hz: a 2 - 4 fold runtimeimprovement. A faster version of our method matches the state of the art at 105Hz. These benchmarks suggest that PointPillars is an appropriate encoding forobject detection in point clouds.