Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution

In contrast to fully supervised methods using pixel-wise mask labels,box-supervised instance segmentation takes advantage of simple box annotations,which has recently attracted increasing research attention. This paper presentsa novel single-shot instance segmentation approach, namely Box2Mask, whichintegrates the classical level-set evolution model into deep neural networklearning to achieve accurate mask prediction with only bounding boxsupervision. Specifically, both the input image and its deep features areemployed to evolve the level-set curves implicitly, and a local consistencymodule based on a pixel affinity kernel is used to mine the local context andspatial relations. Two types of single-stage frameworks, i.e., CNN-based andtransformer-based frameworks, are developed to empower the level-set evolutionfor box-supervised instance segmentation, and each framework consists of threeessential components: instance-aware decoder, box-level matching assignment andlevel-set evolution. By minimizing the level-set energy function, the mask mapof each instance can be iteratively optimized within its bounding boxannotation. The experimental results on five challenging testbeds, coveringgeneral scenes, remote sensing, medical and scene text images, demonstrate theoutstanding performance of our proposed Box2Mask approach for box-supervisedinstance segmentation. In particular, with the Swin-Transformer large backbone,our Box2Mask obtains 42.4% mask AP on COCO, which is on par with the recentlydeveloped fully mask-supervised methods. The code is available at:https://github.com/LiWentomng/boxlevelset.