
Humans have the remarkable ability to perceive objects as a whole, even whenparts of them are occluded. This ability of amodal perception forms the basisof our perceptual and cognitive understanding of our world. To enable robots toreason with this capability, we formulate and propose a novel task that we nameamodal panoptic segmentation. The goal of this task is to simultaneouslypredict the pixel-wise semantic segmentation labels of the visible regions ofstuff classes and the instance segmentation labels of both the visible andoccluded regions of thing classes. To facilitate research on this new task, weextend two established benchmark datasets with pixel-level amodal panopticsegmentation labels that we make publicly available as KITTI-360-APS andBDD100K-APS. We present several strong baselines, along with the amodalpanoptic quality (APQ) and amodal parsing coverage (APC) metrics to quantifythe performance in an interpretable manner. Furthermore, we propose the novelamodal panoptic segmentation network (APSNet), as a first step towardsaddressing this task by explicitly modeling the complex relationships betweenthe occluders and occludes. Extensive experimental evaluations demonstrate thatAPSNet achieves state-of-the-art performance on both benchmarks and moreimportantly exemplifies the utility of amodal recognition. The benchmarks areavailable at http://amodal-panoptic.cs.uni-freiburg.de.