Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement

Estimating 3D poses and shapes in the form of meshes from monocular RGBimages is challenging. Obviously, it is more difficult than estimating 3D posesonly in the form of skeletons or heatmaps. When interacting persons areinvolved, the 3D mesh reconstruction becomes more challenging due to theambiguity introduced by person-to-person occlusions. To tackle the challenges,we propose a coarse-to-fine pipeline that benefits from 1) inverse kinematicsfrom the occlusion-robust 3D skeleton estimation and 2) Transformer-basedrelation-aware refinement techniques. In our pipeline, we first obtainocclusion-robust 3D skeletons for multiple persons from an RGB image. Then, weapply inverse kinematics to convert the estimated skeletons to deformable 3Dmesh parameters. Finally, we apply the Transformer-based mesh refinement thatrefines the obtained mesh parameters considering intra- and inter-personrelations of 3D meshes. Via extensive experiments, we demonstrate theeffectiveness of our method, outperforming state-of-the-arts on 3DPW, MuPoTSand AGORA datasets.