8 months ago

Abstract

Recent advancements in camera-based 3D object detection have introducedcross-modal knowledge distillation to bridge the performance gap with LiDAR 3Ddetectors, leveraging the precise geometric information in LiDAR point clouds.However, existing cross-modal knowledge distillation methods tend to overlookthe inherent imperfections of LiDAR, such as the ambiguity of measurements ondistant or occluded objects, which should not be transferred to the imagedetector. To mitigate these imperfections in LiDAR teacher, we propose a novelmethod that leverages aleatoric uncertainty-free features from ground truthlabels. In contrast to conventional label guidance approaches, we approximatethe inverse function of the teacher's head to effectively embed label inputsinto feature space. This approach provides additional accurate guidancealongside LiDAR teacher, thereby boosting the performance of the imagedetector. Additionally, we introduce feature partitioning, which effectivelytransfers knowledge from the teacher modality while preserving the distinctivefeatures of the student, thereby maximizing the potential of both modalities.Experimental results demonstrate that our approach improves mAP and NDS by 5.1points and 4.9 points compared to the baseline model, proving the effectivenessof our approach. The code is available athttps://github.com/sanmin0312/LabelDistill

Source PDF