8 months ago

3D Machine Vision

Object Tracking

Convolutional Neural Network

Method/Architecture

Computer Vision

Zhaoxin Fan Zhenbo Song Hongyan Liu* Zhiwu Lu Jun He* Xiaoyong Du

Abstract

Point cloud-based large scale place recognition is fundamental for manyapplications like Simultaneous Localization and Mapping (SLAM). Although manymodels have been proposed and have achieved good performance by learningshort-range local features, long-range contextual properties have often beenneglected. Moreover, the model size has also become a bottleneck for their wideapplications. To overcome these challenges, we propose a super light-weightnetwork model termed SVT-Net for large scale place recognition. Specifically,on top of the highly efficient 3D Sparse Convolution (SP-Conv), an Atom-basedSparse Voxel Transformer (ASVT) and a Cluster-based Sparse Voxel Transformer(CSVT) are proposed to learn both short-range local features and long-rangecontextual features in this model. Consisting of ASVT and CSVT, SVT-Net canachieve state-of-the-art on benchmark datasets in terms of both accuracy andspeed with a super-light model size (0.9M). Meanwhile, two simplified versionsof SVT-Net are introduced, which also achieve state-of-the-art and furtherreduce the model size to 0.8M and 0.4M respectively.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

3D Machine Vision

Object Tracking

Convolutional Neural Network

Method/Architecture

Computer Vision

Zhaoxin Fan Zhenbo Song Hongyan Liu* Zhiwu Lu Jun He* Xiaoyong Du

Abstract

Point cloud-based large scale place recognition is fundamental for manyapplications like Simultaneous Localization and Mapping (SLAM). Although manymodels have been proposed and have achieved good performance by learningshort-range local features, long-range contextual properties have often beenneglected. Moreover, the model size has also become a bottleneck for their wideapplications. To overcome these challenges, we propose a super light-weightnetwork model termed SVT-Net for large scale place recognition. Specifically,on top of the highly efficient 3D Sparse Convolution (SP-Conv), an Atom-basedSparse Voxel Transformer (ASVT) and a Cluster-based Sparse Voxel Transformer(CSVT) are proposed to learn both short-range local features and long-rangecontextual features in this model. Consisting of ASVT and CSVT, SVT-Net canachieve state-of-the-art on benchmark datasets in terms of both accuracy andspeed with a super-light model size (0.9M). Meanwhile, two simplified versionsof SVT-Net are introduced, which also achieve state-of-the-art and furtherreduce the model size to 0.8M and 0.4M respectively.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp