PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders

Masked autoencoder has been widely explored in point cloud self-supervisedlearning, whereby the point cloud is generally divided into visible and maskedparts. These methods typically include an encoder accepting visible patches(normalized) and corresponding patch centers (position) as input, with thedecoder accepting the output of the encoder and the centers (position) of themasked parts to reconstruct each point in the masked patches. Then, thepre-trained encoders are used for downstream tasks. In this paper, we show amotivating empirical result that when directly feeding the centers of maskedpatches to the decoder without information from the encoder, it stillreconstructs well. In other words, the centers of patches are important and thereconstruction objective does not necessarily rely on representations of theencoder, thus preventing the encoder from learning semantic representations.Based on this key observation, we propose a simple yet effective method, i.e.,learning to Predict Centers for Point Masked AutoEncoders (PCP-MAE) whichguides the model to learn to predict the significant centers and use thepredicted centers to replace the directly provided centers. Specifically, wepropose a Predicting Center Module (PCM) that shares parameters with theoriginal encoder with extra cross-attention to predict centers. Our method isof high pre-training efficiency compared to other alternatives and achievesgreat improvement over Point-MAE, particularly surpassing it by 5.50% onOBJ-BG, 6.03% on OBJ-ONLY, and 5.17% on PB-T50-RS for 3D object classificationon the ScanObjectNN dataset. The code is available athttps://github.com/aHapBean/PCP-MAE.