HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features

We present HoHoNet, a versatile and efficient framework for holisticunderstanding of an indoor 360-degree panorama using a Latent HorizontalFeature (LHFeat). The compact LHFeat flattens the features along the verticaldirection and has shown success in modeling per-column modality for room layoutreconstruction. HoHoNet advances in two important aspects. First, the deeparchitecture is redesigned to run faster with improved accuracy. Second, wepropose a novel horizon-to-dense module, which relaxes the per-column outputshape constraint, allowing per-pixel dense prediction from LHFeat. HoHoNet isfast: It runs at 52 FPS and 110 FPS with ResNet-50 and ResNet-34 backbonesrespectively, for modeling dense modalities from a high-resolution $512 \times1024$ panorama. HoHoNet is also accurate. On the tasks of layout estimation andsemantic segmentation, HoHoNet achieves results on par with currentstate-of-the-art. On dense depth estimation, HoHoNet outperforms all the priorarts by a large margin.