Mask R-CNN

We present a conceptually simple, flexible, and general framework for objectinstance segmentation. Our approach efficiently detects objects in an imagewhile simultaneously generating a high-quality segmentation mask for eachinstance. The method, called Mask R-CNN, extends Faster R-CNN by adding abranch for predicting an object mask in parallel with the existing branch forbounding box recognition. Mask R-CNN is simple to train and adds only a smalloverhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy togeneralize to other tasks, e.g., allowing us to estimate human poses in thesame framework. We show top results in all three tracks of the COCO suite ofchallenges, including instance segmentation, bounding-box object detection, andperson keypoint detection. Without bells and whistles, Mask R-CNN outperformsall existing, single-model entries on every task, including the COCO 2016challenge winners. We hope our simple and effective approach will serve as asolid baseline and help ease future research in instance-level recognition.Code has been made available at: https://github.com/facebookresearch/Detectron