PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

Recent advances in image-based 3D human shape estimation have been driven bythe significant improvement in representation power afforded by deep neuralnetworks. Although current approaches have demonstrated the potential in realworld settings, they still fail to produce reconstructions with the level ofdetail often present in the input images. We argue that this limitation stemsprimarily form two conflicting requirements; accurate predictions require largecontext, but precise predictions require high resolution. Due to memorylimitations in current hardware, previous approaches tend to take lowresolution images as input to cover large spatial context, and produce lessprecise (or low resolution) 3D estimates as a result. We address thislimitation by formulating a multi-level architecture that is end-to-endtrainable. A coarse level observes the whole image at lower resolution andfocuses on holistic reasoning. This provides context to an fine level whichestimates highly detailed geometry by observing higher-resolution images. Wedemonstrate that our approach significantly outperforms existingstate-of-the-art techniques on single image human shape reconstruction by fullyleveraging 1k-resolution input images.