Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking

Monocular 3D human-pose estimation from static images is a challengingproblem, due to the curse of dimensionality and the ill-posed nature of lifting2D-to-3D. In this paper, we propose a Deep Conditional Variational Autoencoderbased model that synthesizes diverse anatomically plausible 3D-pose samplesconditioned on the estimated 2D-pose. We show that CVAE-based 3D-pose sampleset is consistent with the 2D-pose and helps tackling the inherent ambiguity in2D-to-3D lifting. We propose two strategies for obtaining the final 3D pose-(a) depth-ordering/ordinal relations to score and weight-average the candidate3D-poses, referred to as OrdinalScore, and (b) with supervision from an Oracle.We report close to state of-the-art results on two benchmark datasets usingOrdinalScore, and state-of-the-art results using the Oracle. We also show thatour pipeline yields competitive results without paired image-to-3D annotations.The training and evaluation code is available athttps://github.com/ssfootball04/generative_pose.