8 months ago

Computer Vision

Semantic Segmentation

Computer Vision

Ziyi Wang Xumin Yu Yongming Rao Jie Zhou Jiwen Lu

Abstract

With the overwhelming trend of mask image modeling led by MAE, generativepre-training has shown a remarkable potential to boost the performance offundamental models in 2D vision. However, in 3D vision, the over-reliance onTransformer-based backbones and the unordered nature of point clouds haverestricted the further development of generative pre-training. In this paper,we propose a novel 3D-to-2D generative pre-training method that is adaptable toany point cloud model. We propose to generate view images from differentinstructed poses via the cross-attention mechanism as the pre-training scheme.Generating view images has more precise supervision than its point cloudcounterpart, thus assisting 3D backbones to have a finer comprehension of thegeometrical structure and stereoscopic relations of the point cloud.Experimental results have proved the superiority of our proposed 3D-to-2Dgenerative pre-training over previous pre-training methods. Our method is alsoeffective in boosting the performance of architecture-oriented approaches,achieving state-of-the-art performance when fine-tuning on ScanObjectNNclassification and ShapeNetPart segmentation tasks. Code is available athttps://github.com/wangzy22/TAP.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Computer Vision

Semantic Segmentation

Computer Vision

Ziyi Wang Xumin Yu Yongming Rao Jie Zhou Jiwen Lu

Abstract

With the overwhelming trend of mask image modeling led by MAE, generativepre-training has shown a remarkable potential to boost the performance offundamental models in 2D vision. However, in 3D vision, the over-reliance onTransformer-based backbones and the unordered nature of point clouds haverestricted the further development of generative pre-training. In this paper,we propose a novel 3D-to-2D generative pre-training method that is adaptable toany point cloud model. We propose to generate view images from differentinstructed poses via the cross-attention mechanism as the pre-training scheme.Generating view images has more precise supervision than its point cloudcounterpart, thus assisting 3D backbones to have a finer comprehension of thegeometrical structure and stereoscopic relations of the point cloud.Experimental results have proved the superiority of our proposed 3D-to-2Dgenerative pre-training over previous pre-training methods. Our method is alsoeffective in boosting the performance of architecture-oriented approaches,achieving state-of-the-art performance when fine-tuning on ScanObjectNNclassification and ShapeNetPart segmentation tasks. Code is available athttps://github.com/wangzy22/TAP.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models | Papers | HyperAI