Image Classification On Imagenet Real

Métriques

Accuracy

Params

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle	Accuracy	Params	Paper Title	Repository
BiT-L	90.54%	928M	Big Transfer (BiT): General Visual Representation Learning
MAWS (ViT-6.5B)	91.1%	-	The effectiveness of MAE pre-pretraining for billion-scale pretraining
ResMLP-36	85.6%	45M	ResMLP: Feedforward networks for image classification with data-efficient training
Assemble ResNet-50	87.82%	-	Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network
ResMLP-B24/8 (22k)	-	-	ResMLP: Feedforward networks for image classification with data-efficient training
BiT-M	89.02%	-	Big Transfer (BiT): General Visual Representation Learning
Model soups (ViT-G/14)	91.20%	1843M	Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
CeiT-T	83.6%	-	Incorporating Convolution Designs into Visual Transformers
TokenLearner L/8 (24+11)	91.05%	460M	TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Meta Pseudo Labels (EfficientNet-L2)	91.02%	-	Meta Pseudo Labels
ViTAE-H (MAE, 512)	91.2%	644M	ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond
Model soups (BASIC-L)	91.03%	2440M	Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
FixResNeXt-101 32x48d	89.73%	829M	Fixing the train-test resolution discrepancy
LeViT-384	87.5%	-	LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
ViT-L @384 (DeiT III, 21k)	-	-	DeiT III: Revenge of the ViT
VOLO-D5	90.6%	-	VOLO: Vision Outlooker for Visual Recognition
ResMLP-12	84.6%	15M	ResMLP: Feedforward networks for image classification with data-efficient training
NASNet-A Large	87.56%	-	Learning Transferable Architectures for Scalable Image Recognition
Assemble-ResNet152	88.65%	-	Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network
DeiT-Ti	82.1%	5M	Training data-efficient image transformers & distillation through attention	-

0 of 57 row(s) selected.