HyperAIHyperAI
2 months ago

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Fan, Zhaoxin ; Song, Zhenbo ; Liu, Hongyan ; Lu, Zhiwu ; He, Jun ; Du, Xiaoyong
SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale
  Place Recognition
Abstract

Point cloud-based large scale place recognition is fundamental for manyapplications like Simultaneous Localization and Mapping (SLAM). Although manymodels have been proposed and have achieved good performance by learningshort-range local features, long-range contextual properties have often beenneglected. Moreover, the model size has also become a bottleneck for their wideapplications. To overcome these challenges, we propose a super light-weightnetwork model termed SVT-Net for large scale place recognition. Specifically,on top of the highly efficient 3D Sparse Convolution (SP-Conv), an Atom-basedSparse Voxel Transformer (ASVT) and a Cluster-based Sparse Voxel Transformer(CSVT) are proposed to learn both short-range local features and long-rangecontextual features in this model. Consisting of ASVT and CSVT, SVT-Net canachieve state-of-the-art on benchmark datasets in terms of both accuracy andspeed with a super-light model size (0.9M). Meanwhile, two simplified versionsof SVT-Net are introduced, which also achieve state-of-the-art and furtherreduce the model size to 0.8M and 0.4M respectively.

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition | Latest Papers | HyperAI