DuLa-Net: A Dual-Projection Network for Estimating Room Layouts from a Single RGB Panorama

We present a deep learning framework, called DuLa-Net, to predictManhattan-world 3D room layouts from a single RGB panorama. To achieve betterprediction accuracy, our method leverages two projections of the panorama atonce, namely the equirectangular panorama-view and the perspectiveceiling-view, that each contains different clues about the room layouts. Ournetwork architecture consists of two encoder-decoder branches for analyzingeach of the two views. In addition, a novel feature fusion structure isproposed to connect the two branches, which are then jointly trained to predictthe 2D floor plans and layout heights. To learn more complex room layouts, weintroduce the Realtor360 dataset that contains panoramas of Manhattan-worldroom layouts with different numbers of corners. Experimental results show thatour work outperforms recent state-of-the-art in prediction accuracy andperformance, especially in the rooms with non-cuboid layouts.