HyperAI
HyperAI
Startseite
Plattform
Dokumentation
Neuigkeiten
Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Nutzungsbedingungen
Datenschutzrichtlinie
Deutsch
HyperAI
HyperAI
Toggle Sidebar
Seite durchsuchen…
⌘
K
Command Palette
Search for a command to run...
Plattform
Startseite
SOTA
Semantische Segmentierung
Semantic Segmentation On Ade20K
Semantic Segmentation On Ade20K
Metriken
GFLOPs
Params (M)
Validation mIoU
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
GFLOPs
Params (M)
Validation mIoU
Paper Title
ONE-PEACE
-
1500
63.0
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
M3I Pre-training (InternImage-H)
-
1310
62.9
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
InternImage-H
4635
1310
62.9
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
BEiT-3
-
1900
62.8
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
EVA
-
1074
62.3
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
ViT-Adapter-L (Mask2Former, BEiTv2 pretrain)
-
571
61.5
Vision Transformer Adapter for Dense Predictions
FD-SwinV2-G
-
3000
61.4
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
RevCol-H (Mask2Former)
-
2439
61.0
Reversible Column Networks
MasK DINO (SwinL, multi-scale)
-
223
60.8
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
ViT-Adapter-L (Mask2Former, BEiT pretrain)
-
571
60.5
Vision Transformer Adapter for Dense Predictions
DINOv2 (ViT-g/14 frozen model, w/ ViT-Adapter + Mask2former)
-
1080
60.2
DINOv2: Learning Robust Visual Features without Supervision
SwinV2-G(UperNet)
-
-
59.9
Swin Transformer V2: Scaling Up Capacity and Resolution
SERNet-Former
-
-
59.35
SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks
FocalNet-L (Mask2Former)
-
-
58.5
Focal Modulation Networks
ViT-Adapter-L (UperNet, BEiT pretrain)
-
451
58.4
Vision Transformer Adapter for Dense Predictions
RSSeg-ViT-L (BEiT pretrain)
-
330
58.4
Representation Separation for Semantic Segmentation with Vision Transformers
SeMask (SeMask Swin-L MSFaPN-Mask2Former)
-
-
58.2
SeMask: Semantically Masked Transformers for Semantic Segmentation
SegViT-v2 (BEiT-v2-Large)
-
-
58.2
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
SeMask (SeMask Swin-L FaPN-Mask2Former)
-
-
58.2
SeMask: Semantically Masked Transformers for Semantic Segmentation
DiNAT-L (Mask2Former)
-
-
58.1
Dilated Neighborhood Attention Transformer
0 of 230 row(s) selected.
Previous
Next
Semantic Segmentation On Ade20K | SOTA | HyperAI