Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

Cross-view image translation is challenging because it involves images withdrastically different views and severe deformation. In this paper, we propose anovel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) thatmakes it possible to generate images of natural scenes in arbitrary viewpoints,based on an image of the scene and a novel semantic map. The proposedSelectionGAN explicitly utilizes the semantic information and consists of twostages. In the first stage, the condition image and the target semantic map arefed into a cycled semantic-guided generation network to produce initial coarseresults. In the second stage, we refine the initial results by using amulti-channel attention selection mechanism. Moreover, uncertainty mapsautomatically learned from attentions are used to guide the pixel loss forbetter network optimization. Extensive experiments on Dayton, CVUSA and Ego2Topdatasets show that our model is able to generate significantly better resultsthan the state-of-the-art methods. The source code, data and trained models areavailable at https://github.com/Ha0Tang/SelectionGAN.