LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector

Stereo-based 3D detection aims at detecting 3D object bounding boxes fromstereo images using intermediate depth maps or implicit 3D geometryrepresentations, which provides a low-cost solution for 3D perception. However,its performance is still inferior compared with LiDAR-based detectionalgorithms. To detect and localize accurate 3D bounding boxes, LiDAR-basedmodels can encode accurate object boundaries and surface normal directions fromLiDAR point clouds. However, the detection results of stereo-based detectorsare easily affected by the erroneous depth features due to the limitation ofstereo matching. To solve the problem, we propose LIGA-Stereo (LiDAR GeometryAware Stereo Detector) to learn stereo-based 3D detectors under the guidance ofhigh-level geometry-aware representations of LiDAR-based detection models. Inaddition, we found existing voxel-based stereo detectors failed to learnsemantic features effectively from indirect 3D supervisions. We attach anauxiliary 2D detection head to provide direct 2D semantic supervisions.Experiment results show that the above two strategies improved the geometricand semantic representation capabilities. Compared with the state-of-the-artstereo detector, our method has improved the 3D detection performance of cars,pedestrians, cyclists by 10.44%, 5.69%, 5.97% mAP respectively on the officialKITTI benchmark. The gap between stereo-based and LiDAR-based 3D detectors isfurther narrowed.