Rethinking Atrous Convolution for Semantic Image Segmentation

In this work, we revisit atrous convolution, a powerful tool to explicitlyadjust filter's field-of-view as well as control the resolution of featureresponses computed by Deep Convolutional Neural Networks, in the application ofsemantic image segmentation. To handle the problem of segmenting objects atmultiple scales, we design modules which employ atrous convolution in cascadeor in parallel to capture multi-scale context by adopting multiple atrousrates. Furthermore, we propose to augment our previously proposed AtrousSpatial Pyramid Pooling module, which probes convolutional features at multiplescales, with image-level features encoding global context and further boostperformance. We also elaborate on implementation details and share ourexperience on training our system. The proposed `DeepLabv3' systemsignificantly improves over our previous DeepLab versions without DenseCRFpost-processing and attains comparable performance with other state-of-artmodels on the PASCAL VOC 2012 semantic image segmentation benchmark.