HyperAI
HyperAI超神経
ホーム
プラットフォーム
ドキュメント
ニュース
論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
Command Palette
Search for a command to run...
ホーム
SOTA
画像分類
Image Classification On Inaturalist 2018
Image Classification On Inaturalist 2018
評価指標
Top-1 Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Top-1 Accuracy
Paper Title
Repository
OmniVec2
94.6
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
-
OmniVec
93.8
OmniVec: Learning robust representations with cross modal sharing
-
InternImage-H
92.6%
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
MAWS (ViT-2B)
91.3%
The effectiveness of MAE pre-pretraining for billion-scale pretraining
MetaFormer (MetaFormer-2,384,extra_info)
88.7%
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition
Hiera-H (448px)
87.3%
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
MAE (ViT-H, 448)
86.8%
Masked Autoencoders Are Scalable Vision Learners
SWAG (ViT H/14)
86.0%
Revisiting Weakly Supervised Pre-Training of Visual Perception Models
SEER (RegNet10B - finetuned - 384px)
84.7%
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
MetaFormer (MetaFormer-2,384)
84.3%
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition
OMNIVORE (Swin-L)
84.1%
Omnivore: A Single Model for Many Visual Modalities
RDNet-L (224 res, IN-1K pretrained)
81.8%
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
RegNet-8GF
81.2%
Grafit: Learning fine-grained image representations with coarse labels
-
VL-LTR (ViT-B-16)
81.0%
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
µ2Net+ (ViT-L/16)
80.97
A Continual Development Methodology for Large-scale Multitask Dynamic ML Systems
RDNet-B (224 res, IN-1K pretrained)
80.5
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
MixMIM-L
80.3%
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
DeiT-B
79.5%
Training data-efficient image transformers & distillation through attention
CeiT-S (384 finetune resolution)
79.4%
Incorporating Convolution Designs into Visual Transformers
RDNet-S (224 res, IN-1K pretrained)
79.1
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
0 of 60 row(s) selected.
Previous
Next