Visual Prompt Tuning

Visual Prompt Tuning (VPT) is a parameter-efficient fine-tuning method that introduces a small number of task-specific learnable parameters in the input space while freezing the pretrained Transformer backbone. During downstream task training, these parameters are optimized together with the linear head. VPT performs well in low-data regimes and maintains its advantage across different data scales. Additionally, VPT is competitive with various Transformer scales and designs (such as ViTBase/Large/Huge, Swin), making it an effective approach to adapt to the growing visual backbone networks.

VTAB-1k(Structured<8>)

SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)

VTAB-1k(Natural<7>)

FGVC

VTAB-1k(Specialized<4>)

Command Palette

Visual Prompt Tuning