Semantic Instance Segmentation with a Discriminative Loss Function

Semantic instance segmentation remains a challenging task. In this work wepropose to tackle the problem with a discriminative loss function, operating atthe pixel level, that encourages a convolutional network to produce arepresentation of the image that can easily be clustered into instances with asimple post-processing step. The loss function encourages the network to mapeach pixel to a point in feature space so that pixels belonging to the sameinstance lie close together while different instances are separated by a widemargin. Our approach of combining an off-the-shelf network with a principledloss function inspired by a metric learning objective is conceptually simpleand distinct from recent efforts in instance segmentation. In contrast toprevious works, our method does not rely on object proposals or recurrentmechanisms. A key contribution of our work is to demonstrate that such a simplesetup without bells and whistles is effective and can perform on par with morecomplex methods. Moreover, we show that it does not suffer from some of thelimitations of the popular detect-and-segment approaches. We achievecompetitive performance on the Cityscapes and CVPPP leaf segmentationbenchmarks.