Multi-View Silhouette and Depth Decomposition for High Resolution 3D Object Representation

We consider the problem of scaling deep generative shape models tohigh-resolution. Drawing motivation from the canonical view representation ofobjects, we introduce a novel method for the fast up-sampling of 3D objects invoxel space through networks that perform super-resolution on the sixorthographic depth projections. This allows us to generate high-resolutionobjects with more efficient scaling than methods which work directly in 3D. Wedecompose the problem of 2D depth super-resolution into silhouette and depthprediction to capture both structure and fine detail. This allows our method togenerate sharp edges more easily than an individual network. We evaluate ourwork on multiple experiments concerning high-resolution 3D objects, and showour system is capable of accurately predicting novel objects at resolutions aslarge as 512$\mathbf{\times}$512$\mathbf{\times}$512 -- the highest resolutionreported for this task. We achieve state-of-the-art performance on 3D objectreconstruction from RGB images on the ShapeNet dataset, and further demonstratethe first effective 3D super-resolution method.