Patch-Depth Fusion: Dichotomous Image Segmentation via Fine-Grained Patch Strategy and Depth Integrity-Prior

Dichotomous Image Segmentation (DIS) is a high-precision object segmentationtask for high-resolution natural images. The current mainstream methods focuson the optimization of local details but overlook the fundamental challenge ofmodeling the integrity of objects. We have found that the depth integrity-priorimplicit in the the pseudo-depth maps generated by Depth Anything Model v2 andthe local detail features of image patches can jointly address the abovedilemmas. Based on the above findings, we have designed a novel Patch-DepthFusion Network (PDFNet) for high-precision dichotomous image segmentation. Thecore of PDFNet consists of three aspects. Firstly, the object perception isenhanced through multi-modal input fusion. By utilizing the patch fine-grainedstrategy, coupled with patch selection and enhancement, the sensitivity todetails is improved. Secondly, by leveraging the depth integrity-priordistributed in the depth maps, we propose an integrity-prior loss to enhancethe uniformity of the segmentation results in the depth maps. Finally, weutilize the features of the shared encoder and, through a simple depthrefinement decoder, improve the ability of the shared encoder to capture subtledepth-related information in the images. Experiments on the DIS-5K dataset showthat PDFNet significantly outperforms state-of-the-art non-diffusion methods.Due to the incorporation of the depth integrity-prior, PDFNet achieves or evensurpassing the performance of the latest diffusion-based methods while usingless than 11% of the parameters of diffusion-based methods. The source code athttps://github.com/Tennine2077/PDFNet