Collaborative Regression of Expressive Bodies using Moderation

Recovering expressive humans from images is essential for understanding humanbehavior. Methods that estimate 3D bodies, faces, or hands have progressedsignificantly, yet separately. Face methods recover accurate 3D shape andgeometric details, but need a tight crop and struggle with extreme views andlow resolution. Whole-body methods are robust to a wide range of poses andresolutions, but provide only a rough 3D face shape without details likewrinkles. To get the best of both worlds, we introduce PIXIE, which producesanimatable, whole-body 3D avatars with realistic facial detail, from a singleimage. For this, PIXIE uses two key observations. First, existing work combinesindependent estimates from body, face, and hand experts, by trusting themequally. PIXIE introduces a novel moderator that merges the features of theexperts, weighted by their confidence. All part experts can contribute to thewhole, using SMPL-X's shared shape space across all body parts. Second, humanshape is highly correlated with gender, but existing work ignores this. Welabel training images as male, female, or non-binary, and train PIXIE to infer"gendered" 3D body shapes with a novel shape loss. In addition to 3D body poseand shape parameters, PIXIE estimates expression, illumination, albedo and 3Dfacial surface displacements. Quantitative and qualitative evaluation showsthat PIXIE estimates more accurate whole-body shape and detailed face shapethan the state of the art. Models and code are available athttps://pixie.is.tue.mpg.de.