A Perceptual Shape Loss for Monocular 3D Face Reconstruction

Monocular 3D face reconstruction is a wide-spread topic, and existingapproaches tackle the problem either through fast neural network inference oroffline iterative reconstruction of face geometry. In either casecarefully-designed energy functions are minimized, commonly including lossterms like a photometric loss, a landmark reprojection loss, and others. Inthis work we propose a new loss function for monocular face capture, inspiredby how humans would perceive the quality of a 3D face reconstruction given aparticular image. It is widely known that shading provides a strong indicatorfor 3D shape in the human visual system. As such, our new 'perceptual' shapeloss aims to judge the quality of a 3D face estimate using only shading cues.Our loss is implemented as a discriminator-style neural network that takes aninput face image and a shaded render of the geometry estimate, and thenpredicts a score that perceptually evaluates how well the shaded render matchesthe given image. This 'critic' network operates on the RGB image and geometryrender alone, without requiring an estimate of the albedo or illumination inthe scene. Furthermore, our loss operates entirely in image space and is thusagnostic to mesh topology. We show how our new perceptual shape loss can becombined with traditional energy terms for monocular 3D face optimization anddeep neural network regression, improving upon current state-of-the-artresults.