Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection

Most previous co-salient object detection works mainly focus on extractingco-salient cues via mining the consistency relations across images whileignoring explicit exploration of background regions. In this paper, we proposea Discriminative co-saliency and background Mining Transformer framework (DMT)based on several economical multi-grained correlation modules to explicitlymine both co-saliency and background information and effectively model theirdiscrimination. Specifically, we first propose a region-to-region correlationmodule for introducing inter-image relations to pixel-wise segmentationfeatures while maintaining computational efficiency. Then, we use two types ofpre-defined tokens to mine co-saliency and background information via ourproposed contrast-induced pixel-to-token correlation and co-saliencytoken-to-token correlation modules. We also design a token-guided featurerefinement module to enhance the discriminability of the segmentation featuresunder the guidance of the learned tokens. We perform iterative mutual promotionfor the segmentation feature extraction and token construction. Experimentalresults on three benchmark datasets demonstrate the effectiveness of ourproposed method. The source code is available at:https://github.com/dragonlee258079/DMT.