Residual Attention: A Simple but Effective Method for Multi-Label Recognition

Multi-label image recognition is a challenging computer vision task ofpractical use. Progresses in this area, however, are often characterized bycomplicated methods, heavy computations, and lack of intuitive explanations. Toeffectively capture different spatial regions occupied by objects fromdifferent categories, we propose an embarrassingly simple module, namedclass-specific residual attention (CSRA). CSRA generates class-specificfeatures for every category by proposing a simple spatial attention score, andthen combines it with the class-agnostic average pooling feature. CSRA achievesstate-of-the-art results on multilabel recognition, and at the same time ismuch simpler than them. Furthermore, with only 4 lines of code, CSRA also leadsto consistent improvement across many diverse pretrained models and datasetswithout any extra training. CSRA is both easy to implement and light incomputations, which also enjoys intuitive explanations and visualizations.