Learning an Animatable Detailed 3D Face Model from In-The-Wild Images

While current monocular 3D face reconstruction methods can recover finegeometric details, they suffer several limitations. Some methods produce facesthat cannot be realistically animated because they do not model how wrinklesvary with expression. Other methods are trained on high-quality face scans anddo not generalize well to in-the-wild images. We present the first approachthat regresses 3D face shape and animatable details that are specific to anindividual but change with expression. Our model, DECA (Detailed ExpressionCapture and Animation), is trained to robustly produce a UV displacement mapfrom a low-dimensional latent representation that consists of person-specificdetail parameters and generic expression parameters, while a regressor istrained to predict detail, shape, albedo, expression, pose and illuminationparameters from a single image. To enable this, we introduce a noveldetail-consistency loss that disentangles person-specific details fromexpression-dependent wrinkles. This disentanglement allows us to synthesizerealistic person-specific wrinkles by controlling expression parameters whilekeeping person-specific details unchanged. DECA is learned from in-the-wildimages with no paired 3D supervision and achieves state-of-the-art shapereconstruction accuracy on two benchmarks. Qualitative results on in-the-wilddata demonstrate DECA's robustness and its ability to disentangle identity- andexpression-dependent details enabling animation of reconstructed faces. Themodel and code are publicly available at https://deca.is.tue.mpg.de.