HyperAIHyperAI
2 months ago

Monocular Expressive Body Regression through Body-Driven Attention

Choutas, Vasileios ; Pavlakos, Georgios ; Bolkart, Timo ; Tzionas, Dimitrios ; Black, Michael J.
Monocular Expressive Body Regression through Body-Driven Attention
Abstract

To understand how people look, interact, or perform tasks, we need to quicklyand accurately capture their 3D body, face, and hands together from an RGBimage. Most existing methods focus only on parts of the body. A few recentapproaches reconstruct full expressive 3D humans from images using 3D bodymodels that include the face and hands. These methods are optimization-basedand thus slow, prone to local optima, and require 2D keypoints as input. Weaddress these limitations by introducing ExPose (EXpressive POse and ShaperEgression), which directly regresses the body, face, and hands, in SMPL-Xformat, from an RGB image. This is a hard problem due to the highdimensionality of the body and the lack of expressive training data.Additionally, hands and faces are much smaller than the body, occupying veryfew image pixels. This makes hand and face estimation hard when body images aredownscaled for neural networks. We make three main contributions. First, weaccount for the lack of training data by curating a dataset of SMPL-X fits onin-the-wild images. Second, we observe that body estimation localizes the faceand hands reasonably well. We introduce body-driven attention for face and handregions in the original image to extract higher-resolution crops that are fedto dedicated refinement modules. Third, these modules exploit part-specificknowledge from existing face- and hand-only datasets. ExPose estimatesexpressive 3D humans more accurately than existing optimization methods at asmall fraction of the computational cost. Our data, model and code areavailable for research at https://expose.is.tue.mpg.de .