2 months ago

MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations

Alkin, Benedikt ; Miklautz, Lukas ; Hochreiter, Sepp ; Brandstetter, Johannes

Abstract

We introduce MIM (Masked Image Modeling)-Refiner, a contrastive learningboost for pre-trained MIM models. MIM-Refiner is motivated by the insight thatstrong representations within MIM models generally reside in intermediatelayers. Accordingly, MIM-Refiner leverages multiple contrastive heads that areconnected to different intermediate layers. In each head, a modified nearestneighbor objective constructs semantic clusters that capture semanticinformation which improves performance on downstream tasks, includingoff-the-shelf and fine-tuning settings. The refinement process is short and simple - yet highly effective. Within afew epochs, we refine the features of MIM models from subpar tostate-of-the-art, off-the-shelf features. Refining a ViT-H, pre-trained withdata2vec 2.0 on ImageNet-1K, sets a new state-of-the-art in linear probing(84.7%) and low-shot classification among models that are pre-trained onImageNet-1K. MIM-Refiner efficiently combines the advantages of MIM and IDobjectives and compares favorably against previous state-of-the-art SSL modelson a variety of benchmarks such as low-shot classification, long-tailedclassification, clustering and semantic segmentation.