HyperAI
HyperAI
الرئيسية
المنصة
الوثائق
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
شروط الخدمة
سياسة الخصوصية
العربية
HyperAI
HyperAI
Toggle Sidebar
البحث في الموقع...
⌘
K
Command Palette
Search for a command to run...
المنصة
الرئيسية
SOTA
التمييز الدلالي
Semantic Segmentation On Ade20K
Semantic Segmentation On Ade20K
المقاييس
GFLOPs
Params (M)
Validation mIoU
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
GFLOPs
Params (M)
Validation mIoU
Paper Title
ONE-PEACE
-
1500
63.0
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
M3I Pre-training (InternImage-H)
-
1310
62.9
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
InternImage-H
4635
1310
62.9
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
BEiT-3
-
1900
62.8
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
EVA
-
1074
62.3
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
ViT-Adapter-L (Mask2Former, BEiTv2 pretrain)
-
571
61.5
Vision Transformer Adapter for Dense Predictions
FD-SwinV2-G
-
3000
61.4
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
RevCol-H (Mask2Former)
-
2439
61.0
Reversible Column Networks
MasK DINO (SwinL, multi-scale)
-
223
60.8
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
ViT-Adapter-L (Mask2Former, BEiT pretrain)
-
571
60.5
Vision Transformer Adapter for Dense Predictions
DINOv2 (ViT-g/14 frozen model, w/ ViT-Adapter + Mask2former)
-
1080
60.2
DINOv2: Learning Robust Visual Features without Supervision
SwinV2-G(UperNet)
-
-
59.9
Swin Transformer V2: Scaling Up Capacity and Resolution
SERNet-Former
-
-
59.35
SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks
FocalNet-L (Mask2Former)
-
-
58.5
Focal Modulation Networks
ViT-Adapter-L (UperNet, BEiT pretrain)
-
451
58.4
Vision Transformer Adapter for Dense Predictions
RSSeg-ViT-L (BEiT pretrain)
-
330
58.4
Representation Separation for Semantic Segmentation with Vision Transformers
SeMask (SeMask Swin-L MSFaPN-Mask2Former)
-
-
58.2
SeMask: Semantically Masked Transformers for Semantic Segmentation
SegViT-v2 (BEiT-v2-Large)
-
-
58.2
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
SeMask (SeMask Swin-L FaPN-Mask2Former)
-
-
58.2
SeMask: Semantically Masked Transformers for Semantic Segmentation
DiNAT-L (Mask2Former)
-
-
58.1
Dilated Neighborhood Attention Transformer
0 of 230 row(s) selected.
Previous
Next
Semantic Segmentation On Ade20K | SOTA | HyperAI