On the Calibration of Human Pose Estimation

Most 2D human pose estimation frameworks estimate keypoint confidence in anad-hoc manner, using heuristics such as the maximum value of heatmaps. Theconfidence is part of the evaluation scheme, e.g., AP for the MSCOCO dataset,yet has been largely overlooked in the development of state-of-the-art methods.This paper takes the first steps in addressing miscalibration in poseestimation. From a calibration point of view, the confidence should be alignedwith the pose accuracy. In practice, existing methods are poorly calibrated. Weshow, through theoretical analysis, why a miscalibration gap exists and how tonarrow the gap. Simply predicting the instance size and adjusting theconfidence function gives considerable AP improvements. Given the black-boxnature of deep neural networks, however, it is not possible to fully close thisgap with only closed-form adjustments. As such, we go one step further andlearn network-specific adjustments by enforcing consistency between confidenceand pose accuracy. Our proposed Calibrated ConfidenceNet (CCNet) is alight-weight post-hoc addition that improves AP by up to 1.4% on off-the-shelfpose estimation frameworks. Applied to the downstream task of mesh recovery,CCNet facilitates an additional 1.0mm decrease in 3D keypoint error.