R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition

Recently, vision architectures based exclusively on multi-layer perceptrons(MLPs) have gained much attention in the computer vision community. MLP-likemodels achieve competitive performance on a single 2D image classification withless inductive bias without hand-crafted convolution layers. In this work, weexplore the effectiveness of MLP-based architecture for the view-based 3Dobject recognition task. We present an MLP-based architecture termed asRound-Roll MLP (R$^2$-MLP). It extends the spatial-shift MLP backbone byconsidering the communications between patches from different views. R$^2$-MLProlls part of the channels along the view dimension and promotes informationexchange between neighboring views. We benchmark MLP results on ModelNet10 andModelNet40 datasets with ablations in various aspects. The experimental resultsshow that, with a conceptually simple structure, our R$^2$-MLP achievescompetitive performance compared with existing state-of-the-art methods.