Summarize and Search: Learning Consensus-aware Dynamic Convolution for Co-Saliency Detection

Humans perform co-saliency detection by first summarizing the consensusknowledge in the whole group and then searching corresponding objects in eachimage. Previous methods usually lack robustness, scalability, or stability forthe first process and simply fuse consensus features with image features forthe second process. In this paper, we propose a novel consensus-aware dynamicconvolution model to explicitly and effectively perform the "summarize andsearch" process. To summarize consensus image features, we first summarizerobust features for every single image using an effective pooling method andthen aggregate cross-image consensus cues via the self-attention mechanism. Bydoing this, our model meets the scalability and stability requirements. Next,we generate dynamic kernels from consensus features to encode the summarizedconsensus knowledge. Two kinds of kernels are generated in a supplementary wayto summarize fine-grained image-specific consensus object cues and the coarsegroup-wise common knowledge, respectively. Then, we can effectively performobject searching by employing dynamic convolution at multiple scales. Besides,a novel and effective data synthesis method is also proposed to train ournetwork. Experimental results on four benchmark datasets verify theeffectiveness of our proposed method. Our code and saliency maps are availableat \url{https://github.com/nnizhang/CADC}.