HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Semantic Segmentation
Semantic Segmentation On Ade20K
Semantic Segmentation On Ade20K
Metrics
GFLOPs
Params (M)
Validation mIoU
Results
Performance results of various models on this benchmark
Columns
Model Name
GFLOPs
Params (M)
Validation mIoU
Paper Title
ONE-PEACE
-
1500
63.0
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
M3I Pre-training (InternImage-H)
-
1310
62.9
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
InternImage-H
4635
1310
62.9
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
BEiT-3
-
1900
62.8
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
EVA
-
1074
62.3
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
ViT-Adapter-L (Mask2Former, BEiTv2 pretrain)
-
571
61.5
Vision Transformer Adapter for Dense Predictions
FD-SwinV2-G
-
3000
61.4
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
RevCol-H (Mask2Former)
-
2439
61.0
Reversible Column Networks
MasK DINO (SwinL, multi-scale)
-
223
60.8
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
ViT-Adapter-L (Mask2Former, BEiT pretrain)
-
571
60.5
Vision Transformer Adapter for Dense Predictions
DINOv2 (ViT-g/14 frozen model, w/ ViT-Adapter + Mask2former)
-
1080
60.2
DINOv2: Learning Robust Visual Features without Supervision
SwinV2-G(UperNet)
-
-
59.9
Swin Transformer V2: Scaling Up Capacity and Resolution
SERNet-Former
-
-
59.35
SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks
FocalNet-L (Mask2Former)
-
-
58.5
Focal Modulation Networks
ViT-Adapter-L (UperNet, BEiT pretrain)
-
451
58.4
Vision Transformer Adapter for Dense Predictions
RSSeg-ViT-L (BEiT pretrain)
-
330
58.4
Representation Separation for Semantic Segmentation with Vision Transformers
SeMask (SeMask Swin-L MSFaPN-Mask2Former)
-
-
58.2
SeMask: Semantically Masked Transformers for Semantic Segmentation
SegViT-v2 (BEiT-v2-Large)
-
-
58.2
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
SeMask (SeMask Swin-L FaPN-Mask2Former)
-
-
58.2
SeMask: Semantically Masked Transformers for Semantic Segmentation
DiNAT-L (Mask2Former)
-
-
58.1
Dilated Neighborhood Attention Transformer
0 of 230 row(s) selected.
Previous
Next