DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving

Current end-to-end autonomous driving methods resort to unifying modulardesigns for various tasks (e.g. perception, prediction and planning). Althoughoptimized in a planning-oriented spirit with a fully differentiable framework,existing end-to-end driving systems without ego-centric designs still sufferfrom unsatisfactory performance and inferior efficiency, owing to therasterized scene representation learning and redundant informationtransmission. In this paper, we revisit the human driving behavior and proposean ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.Specifically, DiFSD mainly consists of sparse perception, hierarchicalinteraction and iterative motion planner. The sparse perception module performsdetection, tracking and online mapping based on sparse representation of thedriving scene. The hierarchical interaction module aims to select the ClosestIn-Path Vehicle / Stationary (CIPV / CIPS) from coarse to fine, benefiting froman additional geometric prior. As for the iterative motion planner, bothselected interactive agents and ego-vehicle are considered for joint motionprediction, where the output multi-modal ego-trajectories are optimized in aniterative fashion. Besides, both position-level motion diffusion andtrajectory-level planning denoising are introduced for uncertainty modeling,thus facilitating the training stability and convergence of the wholeframework. Extensive experiments conducted on nuScenes and Bench2Drive datasetsdemonstrate the superior planning performance and great efficiency of DiFSD.