Self Supervised Image Classification On

Metrics

Number of Params

Top 1 Accuracy

Results

Performance results of various models on this benchmark

Model Name	Number of Params	Top 1 Accuracy	Paper Title	Repository
DINOv2 distilled (ViT-S/14)	21M	81.1%	DINOv2: Learning Robust Visual Features without Supervision
SwAV (ResNet-50 x2)	94M	77.3%	Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
MoCo (ResNet-50 4x)	375M	68.6%	Momentum Contrast for Unsupervised Visual Representation Learning
EsViT(Swin-S)	49M	80.8	Efficient Self-supervised Vision Transformers for Representation Learning
AMDIM (arxiv v1)	337M	60.2%	Learning Representations by Maximizing Mutual Information Across Views
SwAV (ResNet-50)	24M	75.3%	Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
CaCo (ResNet-50)	24M	75.7%	CaCo: Both Positive and Negative Samples are Directly Learnable via Cooperative-adversarial Contrastive Learning
iGPT-XL (64x64, 3072 features)	6800M	68.7%	Generative Pretraining from Pixels	-
LocalAgg (ResNet-50)	24M	60.2%	Local Aggregation for Unsupervised Learning of Visual Embeddings
MAE-CT (ViT-H/16)	632M	82.2%	Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
iGPT-L (48x48)	1400M	65.2%	Generative Pretraining from Pixels	-
EsViT (Swin-B)	87M	81.3	Efficient Self-supervised Vision Transformers for Representation Learning
PercMAE (ViT-B)	80M	78.1%	Improving Visual Representation Learning through Perceptual Understanding
MAE (ViT-B)	80M	68.0%	Masked Autoencoders Are Scalable Vision Learners
DINOv2 distilled (ViT-B/14)	85M	84.5%	DINOv2: Learning Robust Visual Features without Supervision
SimCLRv2 (ResNet-50 x2)	94M	75.6%	Big Self-Supervised Models are Strong Semi-Supervised Learners
MIM-Refiner (MAE-ViT-2B/14)	1890M	84.5%	MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MV-MR	-	74.5%	MV-MR: multi-views and multi-representations for self-supervised learning and knowledge distillation
MAE (ViT-L)	306M	75.8%	Masked Autoencoders Are Scalable Vision Learners
ReLICv2 (ResNet101)	44M	78.7%	Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

0 of 142 row(s) selected.