8 months ago

Convolutional Neural Network

Image Recognition

Method/Architecture

Computer Vision

Shuo Chen Tan Yu Ping Li

Abstract

Inspired by the great success achieved by CNN in image recognition,view-based methods applied CNNs to model the projected views for 3D objectunderstanding and achieved excellent performance. Nevertheless, multi-view CNNmodels cannot model the communications between patches from different views,limiting its effectiveness in 3D object recognition. Inspired by the recentsuccess gained by vision Transformer in image recognition, we propose aMulti-view Vision Transformer (MVT) for 3D object recognition. Since each patchfeature in a Transformer block has a global reception field, it naturallyachieves communications between patches from different views. Meanwhile, ittakes much less inductive bias compared with its CNN counterparts. Consideringboth effectiveness and efficiency, we develop a global-local structure for ourMVT. Our experiments on two public benchmarks, ModelNet40 and ModelNet10,demonstrate the competitive performance of our MVT.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Convolutional Neural Network

Image Recognition

Method/Architecture

Computer Vision

Shuo Chen Tan Yu Ping Li

Abstract

Inspired by the great success achieved by CNN in image recognition,view-based methods applied CNNs to model the projected views for 3D objectunderstanding and achieved excellent performance. Nevertheless, multi-view CNNmodels cannot model the communications between patches from different views,limiting its effectiveness in 3D object recognition. Inspired by the recentsuccess gained by vision Transformer in image recognition, we propose aMulti-view Vision Transformer (MVT) for 3D object recognition. Since each patchfeature in a Transformer block has a global reception field, it naturallyachieves communications between patches from different views. Meanwhile, ittakes much less inductive bias compared with its CNN counterparts. Consideringboth effectiveness and efficiency, we develop a global-local structure for ourMVT. Our experiments on two public benchmarks, ModelNet40 and ModelNet10,demonstrate the competitive performance of our MVT.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp