2 months ago

MRL: Learning to Mix with Attention and Convolutions

Mohta, Shlok ; Suganuma, Hisahiro ; Tanaka, Yoshiki

Abstract

In this paper, we present a new neural architectural block for the visiondomain, named Mixing Regionally and Locally (MRL), developed with the aim ofeffectively and efficiently mixing the provided input features. We bifurcatethe input feature mixing task as mixing at a regional and local scale. Toachieve an efficient mix, we exploit the domain-wide receptive field providedby self-attention for regional-scale mixing and convolutional kernelsrestricted to local scale for local-scale mixing. More specifically, ourproposed method mixes regional features associated with local features within adefined region, followed by a local-scale features mix augmented by regionalfeatures. Experiments show that this hybridization of self-attention andconvolution brings improved capacity, generalization (right inductive bias),and efficiency. Under similar network settings, MRL outperforms or is at parwith its counterparts in classification, object detection, and segmentationtasks. We also show that our MRL-based network architecture achievesstate-of-the-art performance for H&E histology datasets. We achieved DICE of0.843, 0.855, and 0.892 for Kumar, CoNSep, and CPM-17 datasets, respectively,while highlighting the versatility offered by the MRL framework byincorporating layers like group convolutions to improve dataset-specificgeneralization.