Multi-scale self-guided attention for medical image segmentation

Even though convolutional neural networks (CNNs) are driving progress inmedical image segmentation, standard models still have some drawbacks. First,the use of multi-scale approaches, i.e., encoder-decoder architectures, leadsto a redundant use of information, where similar low-level features areextracted multiple times at multiple scales. Second, long-range featuredependencies are not efficiently modeled, resulting in non-optimaldiscriminative feature representations associated with each semantic class. Inthis paper we attempt to overcome these limitations with the proposedarchitecture, by capturing richer contextual dependencies based on the use ofguided self-attention mechanisms. This approach is able to integrate localfeatures with their corresponding global dependencies, as well as highlightinterdependent channel maps in an adaptive manner. Further, the additional lossbetween different modules guides the attention mechanisms to neglect irrelevantinformation and focus on more discriminant regions of the image by emphasizingrelevant feature associations. We evaluate the proposed model in the context ofsemantic segmentation on three different datasets: abdominal organs,cardiovascular structures and brain tumors. A series of ablation experimentssupport the importance of these attention modules in the proposed architecture.In addition, compared to other state-of-the-art segmentation networks our modelyields better segmentation performance, increasing the accuracy of thepredictions while reducing the standard deviation. This demonstrates theefficiency of our approach to generate precise and reliable automaticsegmentations of medical images. Our code is made publicly available athttps://github.com/sinAshish/Multi-Scale-Attention