8 months ago

Abstract

LiDAR-based fully sparse architecture has garnered increasing attention.FSDv1 stands out as a representative work, achieving impressive efficacy andefficiency, albeit with intricate structures and handcrafted designs. In thispaper, we present FSDv2, an evolution that aims to simplify the previous FSDv1while eliminating the inductive bias introduced by its handcraftedinstance-level representation, thus promoting better general applicability. Tothis end, we introduce the concept of \textbf{virtual voxels}, which takes overthe clustering-based instance segmentation in FSDv1. Virtual voxels not onlyaddress the notorious issue of the Center Feature Missing problem in fullysparse detectors but also endow the framework with a more elegant andstreamlined approach. Consequently, we develop a suite of components tocomplement the virtual voxel concept, including a virtual voxel encoder, avirtual voxel mixer, and a virtual voxel assignment strategy. Through empiricalvalidation, we demonstrate that the virtual voxel mechanism is functionallysimilar to the handcrafted clustering in FSDv1 while being more general. Weconduct experiments on three large-scale datasets: Waymo Open Dataset,Argoverse 2 dataset, and nuScenes dataset. Our results showcasestate-of-the-art performance on all three datasets, highlighting thesuperiority of FSDv2 in long-range scenarios and its general applicability toachieve competitive performance across diverse scenarios. Moreover, we providecomprehensive experimental analysis to elucidate the workings of FSDv2. Tofoster reproducibility and further research, we have open-sourced FSDv2 athttps://github.com/tusen-ai/SST.

Source PDF View Code