8 months ago

Abstract

Masked autoencoding has achieved great success for self-supervised learningin the image and language domains. However, mask based pretraining has yet toshow benefits for point cloud understanding, likely due to standard backboneslike PointNet being unable to properly handle the training versus testingdistribution mismatch introduced by masking during training. In this paper, webridge this gap by proposing a discriminative mask pretraining Transformerframework, MaskPoint}, for point clouds. Our key idea is to represent the pointcloud as discrete occupancy values (1 if part of the point cloud; 0 if not),and perform simple binary classification between masked object points andsampled noise points as the proxy task. In this way, our approach is robust tothe point sampling variance in point clouds, and facilitates learning richrepresentations. We evaluate our pretrained models across several downstreamtasks, including 3D shape classification, segmentation, and real-word objectdetection, and demonstrate state-of-the-art results while achieving asignificant pretraining speedup (e.g., 4.1x on ScanNet) compared to the priorstate-of-the-art Transformer baseline. Code is available athttps://github.com/haotian-liu/MaskPoint.

Source PDF View Code