Explicit Visual Prompting for Universal Foreground Segmentations

Foreground segmentation is a fundamental problem in computer vision, whichincludes salient object detection, forgery detection, defocus blur detection,shadow detection, and camouflage object detection. Previous works havetypically relied on domain-specific solutions to address accuracy androbustness issues in those applications. In this paper, we present a unifiedframework for a number of foreground segmentation tasks without anytask-specific designs. We take inspiration from the widely-used pre-trainingand then prompt tuning protocols in NLP and propose a new visual promptingmodel, named Explicit Visual Prompting (EVP). Different from the previousvisual prompting which is typically a dataset-level implicit embedding, our keyinsight is to enforce the tunable parameters focusing on the explicit visualcontent from each individual image, i.e., the features from frozen patchembeddings and high-frequency components. Our method freezes a pre-trainedmodel and then learns task-specific knowledge using a few extra parameters.Despite introducing only a small number of tunable parameters, EVP achievessuperior performance than full fine-tuning and other parameter-efficientfine-tuning methods. Experiments in fourteen datasets across five tasks showthe proposed method outperforms other task-specific methods while beingconsiderably simple. The proposed method demonstrates the scalability indifferent architectures, pre-trained weights, and tasks. The code is availableat: https://github.com/NiFangBaAGe/Explicit-Visual-Prompt.