Vote from the Center: 6 DoF Pose Estimation in RGB-D Images by Radial Keypoint Voting

We propose a novel keypoint voting scheme based on intersecting spheres, thatis more accurate than existing schemes and allows for fewer, more dispersekeypoints. The scheme is based upon the distance between points, which as a 1Dquantity can be regressed more accurately than the 2D and 3D vector and offsetquantities regressed in previous work, yielding more accurate keypointlocalization. The scheme forms the basis of the proposed RCVPose method for 6DoF pose estimation of 3D objects in RGB-D data, which is particularlyeffective at handling occlusions. A CNN is trained to estimate the distancebetween the 3D point corresponding to the depth mode of each RGB pixel, and aset of 3 disperse keypoints defined in the object frame. At inference, a spherecentered at each 3D point is generated, of radius equal to this estimateddistance. The surfaces of these spheres vote to increment a 3D accumulatorspace, the peaks of which indicate keypoint locations. The proposed radialvoting scheme is more accurate than previous vector or offset schemes, and isrobust to disperse keypoints. Experiments demonstrate RCVPose to be highlyaccurate and competitive, achieving state-of-the-art results on the LINEMOD99.7% and YCB-Video 97.2% datasets, notably scoring +4.9% higher 71.1% thanprevious methods on the challenging Occlusion LINEMOD dataset, and on averageoutperforming all other published results from the BOP benchmark for these 3datasets. Our code is available at http://www.github.com/aaronwool/rcvpose.