DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving

End-to-end autonomous driving aims to build a fully differentiable systemthat takes raw sensor data as inputs and directly outputs the plannedtrajectory or control signals of the ego vehicle. State-of-the-art methodsusually follow the `Teacher-Student' paradigm. The Teacher model usesprivileged information (ground-truth states of surrounding agents and mapelements) to learn the driving strategy. The student model only has access toraw sensor data and conducts behavior cloning on the data collected by theteacher model. By eliminating the noise of the perception part during planninglearning, state-of-the-art works could achieve better performance withsignificantly less data compared to those coupled ones. However, under the current Teacher-Student paradigm, the student model stillneeds to learn a planning head from scratch, which could be challenging due tothe redundant and noisy nature of raw sensor inputs and the casual confusionissue of behavior cloning. In this work, we aim to explore the possibility ofdirectly adopting the strong teacher model to conduct planning while lettingthe student model focus more on the perception part. We find that even equippedwith a SOTA perception model, directly letting the student model learn therequired inputs of the teacher model leads to poor driving performance, whichcomes from the large distribution gap between predicted privileged inputs andthe ground-truth. To this end, we propose DriveAdapter, which employs adapters with the featurealignment objective function between the student (perception) and teacher(planning) modules. Additionally, since the pure learning-based teacher modelitself is imperfect and occasionally breaks safety rules, we propose a methodof action-guided feature learning with a mask for those imperfect teacherfeatures to further inject the priors of hand-crafted rules into the learningprocess.