Highly Accurate Dichotomous Image Segmentation

We present a systematic study on a new task called dichotomous imagesegmentation (DIS) , which aims to segment highly accurate objects from naturalimages. To this end, we collected the first large-scale DIS dataset, calledDIS5K, which contains 5,470 high-resolution (e.g., 2K, 4K or larger) imagescovering camouflaged, salient, or meticulous objects in various backgrounds.DIS is annotated with extremely fine-grained labels. Besides, we introduce asimple intermediate supervision baseline (IS-Net) using both feature-level andmask-level guidance for DIS model training. IS-Net outperforms variouscutting-edge baselines on the proposed DIS5K, making it a general self-learnedsupervision network that can facilitate future research in DIS. Further, wedesign a new metric called human correction efforts (HCE) which approximatesthe number of mouse clicking operations required to correct the false positivesand false negatives. HCE is utilized to measure the gap between models andreal-world applications and thus can complement existing metrics. Finally, weconduct the largest-scale benchmark, evaluating 16 representative segmentationmodels, providing a more insightful discussion regarding object complexities,and showing several potential applications (e.g., background removal, artdesign, 3D reconstruction). Hoping these efforts can open up promisingdirections for both academic and industries. Project page:https://xuebinqin.github.io/dis/index.html.