8 months ago

Convolutional Neural Network

Object Detection

Method/Architecture

Computer Vision

Hongyu Zhou Zheng Ge Zeming Li Xiangyu Zhang

Abstract

This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) viewtransformation method for 3D perception, dubbed MatrixVT. Existing viewtransformers either suffer from poor transformation efficiency or rely ondevice-specific operators, hindering the broad application of BEV models. Incontrast, our method generates BEV features efficiently with only convolutionsand matrix multiplications (MatMul). Specifically, we propose describing theBEV feature as the MatMul of image feature and a sparse Feature TransportingMatrix (FTM). A Prime Extraction module is then introduced to compress thedimension of image features and reduce FTM's sparsity. Moreover, we propose theRing & Ray Decomposition to replace the FTM with two matrices and reformulateour pipeline to reduce calculation further. Compared to existing methods,MatrixVT enjoys a faster speed and less memory footprint while remainingdeploy-friendly. Extensive experiments on the nuScenes benchmark demonstratethat our method is highly efficient but obtains results on par with the SOTAmethod in object detection and map segmentation tasks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Convolutional Neural Network

Object Detection

Method/Architecture

Computer Vision

Hongyu Zhou Zheng Ge Zeming Li Xiangyu Zhang

Abstract

This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) viewtransformation method for 3D perception, dubbed MatrixVT. Existing viewtransformers either suffer from poor transformation efficiency or rely ondevice-specific operators, hindering the broad application of BEV models. Incontrast, our method generates BEV features efficiently with only convolutionsand matrix multiplications (MatMul). Specifically, we propose describing theBEV feature as the MatMul of image feature and a sparse Feature TransportingMatrix (FTM). A Prime Extraction module is then introduced to compress thedimension of image features and reduce FTM's sparsity. Moreover, we propose theRing & Ray Decomposition to replace the FTM with two matrices and reformulateour pipeline to reduce calculation further. Compared to existing methods,MatrixVT enjoys a faster speed and less memory footprint while remainingdeploy-friendly. Extensive experiments on the nuScenes benchmark demonstratethat our method is highly efficient but obtains results on par with the SOTAmethod in object detection and map segmentation tasks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp