W-PoseNet: Dense Correspondence Regularized Pixel Pair Pose Regression

Solving 6D pose estimation is non-trivial to cope with intrinsic appearanceand shape variation and severe inter-object occlusion, and is made morechallenging in light of extrinsic large illumination changes and low quality ofthe acquired data under an uncontrolled environment. This paper introduces anovel pose estimation algorithm W-PoseNet, which densely regresses from inputdata to 6D pose and also 3D coordinates in model space. In other words, localfeatures learned for pose regression in our deep network are regularized byexplicitly learning pixel-wise correspondence mapping onto 3D pose-sensitivecoordinates as an auxiliary task. Moreover, a sparse pair combination ofpixel-wise features and soft voting on pixel-pair pose predictions are designedto improve robustness to inconsistent and sparse local features. Experimentresults on the popular YCB-Video and LineMOD benchmarks show that the proposedW-PoseNet consistently achieves superior performance to the state-of-the-artalgorithms.