Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows

3D human pose estimation from monocular images is a highly ill-posed problemdue to depth ambiguities and occlusions. Nonetheless, most existing worksignore these ambiguities and only estimate a single solution. In contrast, wegenerate a diverse set of hypotheses that represents the full posteriordistribution of feasible 3D poses. To this end, we propose a normalizing flowbased method that exploits the deterministic 3D-to-2D mapping to solve theambiguous inverse 2D-to-3D problem. Additionally, uncertain detections andocclusions are effectively modeled by incorporating uncertainty information ofthe 2D detector as condition. Further keys to success are a learned 3D poseprior and a generalization of the best-of-M loss. We evaluate our approach onthe two benchmark datasets Human3.6M and MPI-INF-3DHP, outperforming allcomparable methods in most metrics. The implementation is available on GitHub.