SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation

Image segmentation plays an important role in vision understanding. Recently,the emerging vision foundation models continuously achieved superiorperformance on various tasks. Following such success, in this paper, we provethat the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shapedsegmentation models. We propose a simple but effective framework, termedSAM2-UNet, for versatile image segmentation. Specifically, SAM2-UNet adopts theHiera backbone of SAM2 as the encoder, while the decoder uses the classicU-shaped design. Additionally, adapters are inserted into the encoder to allowparameter-efficient fine-tuning. Preliminary experiments on various downstreamtasks, such as camouflaged object detection, salient object detection, marineanimal segmentation, mirror detection, and polyp segmentation, demonstrate thatour SAM2-UNet can simply beat existing specialized state-of-the-art methodswithout bells and whistles. Project page:\url{https://github.com/WZH0120/SAM2-UNet}.