BoxInst: High-Performance Instance Segmentation with Box Annotations

We present a high-performance method that can achieve mask-level instancesegmentation with only bounding-box annotations for training. While thissetting has been studied in the literature, here we show significantly strongerperformance with a simple design (e.g., dramatically improving previous bestreported mask AP of 21.1% in Hsu et al. (2019) to 31.6% on the COCO dataset).Our core idea is to redesign the loss of learning masks in instancesegmentation, with no modification to the segmentation network itself. The newloss functions can supervise the mask training without relying on maskannotations. This is made possible with two loss terms, namely, 1) a surrogateterm that minimizes the discrepancy between the projections of the ground-truthbox and the predicted mask; 2) a pairwise loss that can exploit the prior thatproximal pixels with similar colors are very likely to have the same categorylabel. Experiments demonstrate that the redesigned mask loss can yieldsurprisingly high-quality instance masks with only box annotations. Forexample, without using any mask annotations, with a ResNet-101 backbone and 3xtraining schedule, we achieve 33.2% mask AP on COCO test-dev split (vs. 39.1%of the fully supervised counterpart). Our excellent experiment results on COCOand Pascal VOC indicate that our method dramatically narrows the performancegap between weakly and fully supervised instance segmentation. Code is available at: https://git.io/AdelaiDet