Accurate 3D Body Shape Regression using Metric and Semantic Attributes

While methods that regress 3D human meshes from images have progressedrapidly, the estimated body shapes often do not capture the true human shape.This is problematic since, for many applications, accurate body shape is asimportant as pose. The key reason that body shape accuracy lags pose accuracyis the lack of data. While humans can label 2D joints, and these constrain 3Dpose, it is not so easy to "label" 3D body shape. Since paired data with imagesand 3D body shape are rare, we exploit two sources of information: (1) wecollect internet images of diverse "fashion" models together with a small setof anthropometric measurements; (2) we collect linguistic shape attributes fora wide range of 3D body meshes and the model images. Taken together, thesedatasets provide sufficient constraints to infer dense 3D shape. We exploit theanthropometric measurements and linguistic shape attributes in several novelways to train a neural network, called SHAPY, that regresses 3D human pose andshape from an RGB image. We evaluate SHAPY on public benchmarks, but note thatthey either lack significant body shape variation, ground-truth shape, orclothing variation. Thus, we collect a new dataset for evaluating 3D humanshape estimation, called HBW, containing photos of "Human Bodies in the Wild"for which we have ground-truth 3D body scans. On this new benchmark, SHAPYsignificantly outperforms state-of-the-art methods on the task of 3D body shapeestimation. This is the first demonstration that 3D body shape regression fromimages can be trained from easy-to-obtain anthropometric measurements andlinguistic shape attributes. Our model and data are available at:shapy.is.tue.mpg.de