Re-thinking Co-Salient Object Detection

In this paper, we conduct a comprehensive study on the co-salient objectdetection (CoSOD) problem for images. CoSOD is an emerging and rapidly growingextension of salient object detection (SOD), which aims to detect theco-occurring salient objects in a group of images. However, existing CoSODdatasets often have a serious data bias, assuming that each group of imagescontains salient objects of similar visual appearances. This bias can lead tothe ideal settings and effectiveness of models trained on existing datasets,being impaired in real-life situations, where similarities are usually semanticor conceptual. To tackle this issue, we first introduce a new benchmark, calledCoSOD3k in the wild, which requires a large amount of semantic context, makingit more challenging than existing CoSOD datasets. Our CoSOD3k consists of 3,316high-quality, elaborately selected images divided into 160 groups withhierarchical annotations. The images span a wide range of categories, shapes,object sizes, and backgrounds. Second, we integrate the existing SOD techniquesto build a unified, trainable CoSOD framework, which is long overdue in thisfield. Specifically, we propose a novel CoEG-Net that augments our prior modelEGNet with a co-attention projection strategy to enable fast common informationlearning. CoEG-Net fully leverages previous large-scale SOD datasets andsignificantly improves the model scalability and stability. Third, wecomprehensively summarize 40 cutting-edge algorithms, benchmarking 18 of themover three challenging CoSOD datasets (iCoSeg, CoSal2015, and our CoSOD3k), andreporting more detailed (i.e., group-level) performance analysis. Finally, wediscuss the challenges and future works of CoSOD. We hope that our study willgive a strong boost to growth in the CoSOD community. The benchmark toolbox andresults are available on our project page at http://dpfan.net/CoSOD3K/.