Pose2Seg: Detection Free Human Instance Segmentation

The standard approach to image instance segmentation is to perform the objectdetection first, and then segment the object from the detection bounding-box.More recently, deep learning methods like Mask R-CNN perform them jointly.However, little research takes into account the uniqueness of the "human"category, which can be well defined by the pose skeleton. Moreover, the humanpose skeleton can be used to better distinguish instances with heavy occlusionthan using bounding-boxes. In this paper, we present a brand new pose-basedinstance segmentation framework for humans which separates instances based onhuman pose, rather than proposal region detection. We demonstrate that ourpose-based framework can achieve better accuracy than the state-of-artdetection-based approach on the human instance segmentation problem, and canmoreover better handle occlusion. Furthermore, there are few public datasetscontaining many heavily occluded humans along with comprehensive annotations,which makes this a challenging problem seldom noticed by researchers.Therefore, in this paper we introduce a new benchmark "Occluded Human(OCHuman)", which focuses on occluded humans with comprehensive annotationsincluding bounding-box, human pose and instance masks. This dataset contains8110 detailed annotated human instances within 4731 images. With an average0.67 MaxIoU for each person, OCHuman is the most complex and challengingdataset related to human instance segmentation. Through this dataset, we wantto emphasize occlusion as a challenging problem for researchers to study.