HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
Self Supervised Image Classification
Self Supervised Image Classification On 1
Self Supervised Image Classification On 1
평가 지표
Number of Params
Top 1 Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Number of Params
Top 1 Accuracy
Paper Title
Repository
DINOv2 (ViT-g/14, 448)
1100M
88.9%
DINOv2: Learning Robust Visual Features without Supervision
EsViT (Swin-B)
87M
83.9%
Efficient Self-supervised Vision Transformers for Representation Learning
iBOT (ViT-L/16)
307M
84.8%
iBOT: Image BERT Pre-Training with Online Tokenizer
A2MIM (ResNet-50 RSB-A2)
-
80.4%
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN
MoCo (Resnet-50)
-
77.0%
Momentum Contrast for Unsupervised Visual Representation Learning
iBOT(ViT-L/16)
307M
86.6%
iBOT: Image BERT Pre-Training with Online Tokenizer
MIRL (ViT-B-48)
341M
86.2%
Masked Image Residual Learning for Scaling Deeper Vision Transformers
SimMIM (SwinV2-H, 512)
658M
87.1%
SimMIM: A Simple Framework for Masked Image Modeling
DnC (Resnet-50)
-
78.2%
Divide and Contrast: Self-supervised Learning from Uncurated Data
-
A2MIM+ (ViT-B)
-
84.5%
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN
PercMAE (ViT-L, dVAE)
307M
88.6%
Improving Visual Representation Learning through Perceptual Understanding
A2MIM+ (ViT-S)
-
82.4%
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN
DINOv2 (ViT-g/14)
1100M
88.5%
DINOv2: Learning Robust Visual Features without Supervision
ResNet-152 (SparK pre-training)
60M
82.7%
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
ConvNeXt-Base (SparK pre-training)
89M
84.8%
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
iBOT (ViT-B/16)
85M
84.0%
iBOT: Image BERT Pre-Training with Online Tokenizer
BEiT-L (ViT)
307M
86.3%
BEiT: BERT Pre-Training of Image Transformers
MaskFeat (ViT-L)
307M
85.7%
Masked Feature Prediction for Self-Supervised Visual Pre-Training
A2MIM (ViT-B)
-
84.2%
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN
SwAV (ResNeXt-101-32x16d)
193M
82.0%
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
0 of 65 row(s) selected.
Previous
Next