HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation

Monocular 3D human pose and shape estimation is an ill-posed problem sincemultiple 3D solutions can explain a 2D image of a subject. Recent approachespredict a probability distribution over plausible 3D pose and shape parametersconditioned on the image. We show that these approaches exhibit a trade-offbetween three key properties: (i) accuracy - the likelihood of the ground-truth3D solution under the predicted distribution, (ii) sample-input consistency -the extent to which 3D samples from the predicted distribution match thevisible 2D image evidence, and (iii) sample diversity - the range of plausible3D solutions modelled by the predicted distribution. Our method, HuManiFlow,predicts simultaneously accurate, consistent and diverse distributions. We usethe human kinematic tree to factorise full body pose into ancestor-conditionedper-body-part pose distributions in an autoregressive manner. Per-body-partdistributions are implemented using normalising flows that respect the manifoldstructure of SO(3), the Lie group of per-body-part poses. We show thatill-posed, but ubiquitous, 3D point estimate losses reduce sample diversity,and employ only probabilistic training losses. Code is available at:https://github.com/akashsengupta1997/HuManiFlow.