Weakly Supervised Instance Segmentation using Class Peak Response

Weakly supervised instance segmentation with image-level labels, instead ofexpensive pixel-level masks, remains unexplored. In this paper, we tackle thischallenging problem by exploiting class peak responses to enable aclassification network for instance mask extraction. With image labelssupervision only, CNN classifiers in a fully convolutional manner can produceclass response maps, which specify classification confidence at each imagelocation. We observed that local maximums, i.e., peaks, in a class response maptypically correspond to strong visual cues residing inside each instance.Motivated by this, we first design a process to stimulate peaks to emerge froma class response map. The emerged peaks are then back-propagated andeffectively mapped to highly informative regions of each object instance, suchas instance boundaries. We refer to the above maps generated from class peakresponses as Peak Response Maps (PRMs). PRMs provide a fine-detailedinstance-level representation, which allows instance masks to be extracted evenwith some off-the-shelf methods. To the best of our knowledge, we for the firsttime report results for the challenging image-level supervised instancesegmentation task. Extensive experiments show that our method also boostsweakly supervised pointwise localization as well as semantic segmentationperformance, and reports state-of-the-art results on popular benchmarks,including PASCAL VOC 2012 and MS COCO.