Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation

We observe that human poses exhibit strong group-wise structural correlationand spatial coupling between keypoints due to the biological constraints ofdifferent body parts. This group-wise structural correlation can be explored toimprove the accuracy and robustness of human pose estimation. In this work, wedevelop a self-constrained prediction-verification network to characterize andlearn the structural correlation between keypoints during training. During theinference stage, the feedback information from the verification network allowsus to perform further optimization of pose prediction, which significantlyimproves the performance of human pose estimation. Specifically, we partitionthe keypoints into groups according to the biological structure of human body.Within each group, the keypoints are further partitioned into two subsets,high-confidence base keypoints and low-confidence terminal keypoints. Wedevelop a self-constrained prediction-verification network to perform forwardand backward predictions between these keypoint subsets. One fundamentalchallenge in pose estimation, as well as in generic prediction tasks, is thatthere is no mechanism for us to verify if the obtained pose estimation orprediction results are accurate or not, since the ground truth is notavailable. Once successfully learned, the verification network serves as anaccuracy verification module for the forward pose prediction. During theinference stage, it can be used to guide the local optimization of the poseestimation results of low-confidence keypoints with the self-constrained losson high-confidence keypoints as the objective function. Our extensiveexperimental results on benchmark MS COCO and CrowdPose datasets demonstratethat the proposed method can significantly improve the pose estimation results.