Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

Image translation and manipulation have gain increasing attention along withthe rapid development of deep generative models. Although existing approacheshave brought impressive results, they mainly operated in 2D space. In light ofrecent advances in NeRF-based 3D-aware generative models, we introduce a newtask, Semantic-to-NeRF translation, that aims to reconstruct a 3D scenemodelled by NeRF, conditioned on one single-view semantic mask as input. Tokick-off this novel task, we propose the Sem2NeRF framework. In particular,Sem2NeRF addresses the highly challenging task by encoding the semantic maskinto the latent code that controls the 3D scene representation of a pre-traineddecoder. To further improve the accuracy of the mapping, we integrate a newregion-aware learning strategy into the design of both the encoder and thedecoder. We verify the efficacy of the proposed Sem2NeRF and demonstrate thatit outperforms several strong baselines on two benchmark datasets. Code andvideo are available at https://donydchen.github.io/sem2nerf/