CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking

To enable self-driving vehicles accurate detection and tracking ofsurrounding objects is essential. While Light Detection and Ranging (LiDAR)sensors have set the benchmark for high-performance systems, the appeal ofcamera-only solutions lies in their cost-effectiveness. Notably, despite theprevalent use of Radio Detection and Ranging (RADAR) sensors in automotivesystems, their potential in 3D detection and tracking has been largelydisregarded due to data sparsity and measurement noise. As a recentdevelopment, the combination of RADARs and cameras is emerging as a promisingsolution. This paper presents Camera-RADAR 3D Detection and Tracking (CR3DT), acamera-RADAR fusion model for 3D object detection, and Multi-Object Tracking(MOT). Building upon the foundations of the State-of-the-Art (SotA) camera-onlyBEVDet architecture, CR3DT demonstrates substantial improvements in bothdetection and tracking capabilities, by incorporating the spatial and velocityinformation of the RADAR sensor. Experimental results demonstrate an absoluteimprovement in detection performance of 5.3% in mean Average Precision (mAP)and a 14.9% increase in Average Multi-Object Tracking Accuracy (AMOTA) on thenuScenes dataset when leveraging both modalities. CR3DT bridges the gap betweenhigh-performance and cost-effective perception systems in autonomous driving,by capitalizing on the ubiquitous presence of RADAR in automotive applications.The code is available at: https://github.com/ETH-PBL/CR3DT.