Explicit Visual Prompting for Low-Level Structure Segmentations

We consider the generic problem of detecting low-level structures in images,which includes segmenting the manipulated parts, identifying out-of-focuspixels, separating shadow regions, and detecting concealed objects. Whereaseach such topic has been typically addressed with a domain-specific solution,we show that a unified approach performs well across all of them. We takeinspiration from the widely-used pre-training and then prompt tuning protocolsin NLP and propose a new visual prompting model, named Explicit VisualPrompting (EVP). Different from the previous visual prompting which istypically a dataset-level implicit embedding, our key insight is to enforce thetunable parameters focusing on the explicit visual content from each individualimage, i.e., the features from frozen patch embeddings and the input'shigh-frequency components. The proposed EVP significantly outperforms otherparameter-efficient tuning protocols under the same amount of tunableparameters (5.7% extra trainable parameters of each task). EVP also achievesstate-of-the-art performances on diverse low-level structure segmentation taskscompared to task-specific solutions. Our code is available at:https://github.com/NiFangBaAGe/Explicit-Visual-Prompt.