HyperAI

Semantic Segmentation On Ade20K Val

Metrics

mIoU

Results

Performance results of various models on this benchmark

Model Name
mIoU
Paper TitleRepository
SeMask (SeMask Swin-L FaPN-Mask2Former)58.2SeMask: Semantically Masked Transformers for Semantic Segmentation
Auto-DeepLab-L43.98Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
BEiT-L (ViT+UperNet, ImageNet-22k pretrain)57.0BEiT: BERT Pre-Training of Image Transformers
Swin-L (UperNet, ImageNet-22k pretrain)53.5Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
EVA61.5EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
MixMIM-B50.3MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
SeMask (SeMask Swin-L MSFaPN-Mask2Former, single-scale)57.0SeMask: Semantically Masked Transformers for Semantic Segmentation
PatchConvNet-B120 (UperNet)52.8Augmenting Convolutional networks with attention-based aggregation
Mask2Former (Swin-L-FaPN, multiscale)57.7Masked-attention Mask Transformer for Universal Image Segmentation
Twins-SVT-L (UperNet, ImageNet-1k pretrain)50.2Twins: Revisiting the Design of Spatial Attention in Vision Transformers
DNL45.97Disentangled Non-Local Neural Networks
DPT-Hybrid49.02Vision Transformers for Dense Prediction
DCNAS47.12DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation-
ACNet (ResNet-101)45.90Adaptive Context Network for Scene Parsing-
Swin-S (RPE w/ GAB)46.41Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields
Mask2Former (Swin-L-FaPN)56.4Masked-attention Mask Transformer for Universal Image Segmentation
OneFormer (InternImage-H, emb_dim=256, multi-scale, 896x896)60.8OneFormer: One Transformer to Rule Universal Image Segmentation
ViT-Adapter-L (UperNet, BEiT pretrain)58.4Vision Transformer Adapter for Dense Predictions
Light-Ham (VAN-Large, 46M, IN-1k, MS)51.0Is Attention Better Than Matrix Decomposition?
OCR (ResNet-101)45.28Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation
0 of 94 row(s) selected.