Parsing R-CNN for Instance-Level Human Analysis

Instance-level human analysis is common in real-life scenarios and hasmultiple manifestations, such as human part segmentation, dense poseestimation, human-object interactions, etc. Models need to distinguishdifferent human instances in the image panel and learn rich features torepresent the details of each instance. In this paper, we present an end-to-endpipeline for solving the instance-level human analysis, named Parsing R-CNN. Itprocesses a set of human instances simultaneously through comprehensiveconsidering the characteristics of region-based approach and the appearance ofa human, thus allowing representing the details of instances. Parsing R-CNN isvery flexible and efficient, which is applicable to many issues in humaninstance analysis. Our approach outperforms all state-of-the-art methods onCIHP (Crowd Instance-level Human Parsing), MHP v2.0 (Multi-Human Parsing) andDensePose-COCO datasets. Based on the proposed Parsing R-CNN, we reach the 1stplace in the COCO 2018 Challenge DensePose Estimation task. Code and models arepublic available.