Confidence Guided Stereo 3D Object Detection with Split Depth Estimation

Accurate and reliable 3D object detection is vital to safe autonomousdriving. Despite recent developments, the performance gap between stereo-basedmethods and LiDAR-based methods is still considerable. Accurate depthestimation is crucial to the performance of stereo-based 3D object detectionmethods, particularly for those pixels associated with objects in theforeground. Moreover, stereo-based methods suffer from high variance in thedepth estimation accuracy, which is often not considered in the objectdetection pipeline. To tackle these two issues, we propose CG-Stereo, aconfidence-guided stereo 3D object detection pipeline that uses separatedecoders for foreground and background pixels during depth estimation, andleverages the confidence estimation from the depth estimation network as a softattention mechanism in the 3D object detector. Our approach outperforms allstate-of-the-art stereo-based 3D detectors on the KITTI benchmark.