Semantic Segmentation On Ade20K Val

Metrics

mIoU

Results

Performance results of various models on this benchmark

Model Name	mIoU	Paper Title	Repository
SeMask (SeMask Swin-L FaPN-Mask2Former)	58.2	SeMask: Semantically Masked Transformers for Semantic Segmentation
Auto-DeepLab-L	43.98	Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
BEiT-L (ViT+UperNet, ImageNet-22k pretrain)	57.0	BEiT: BERT Pre-Training of Image Transformers
Swin-L (UperNet, ImageNet-22k pretrain)	53.5	Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
EVA	61.5	EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
MixMIM-B	50.3	MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
SeMask (SeMask Swin-L MSFaPN-Mask2Former, single-scale)	57.0	SeMask: Semantically Masked Transformers for Semantic Segmentation
PatchConvNet-B120 (UperNet)	52.8	Augmenting Convolutional networks with attention-based aggregation
Mask2Former (Swin-L-FaPN, multiscale)	57.7	Masked-attention Mask Transformer for Universal Image Segmentation
Twins-SVT-L (UperNet, ImageNet-1k pretrain)	50.2	Twins: Revisiting the Design of Spatial Attention in Vision Transformers
DNL	45.97	Disentangled Non-Local Neural Networks
DPT-Hybrid	49.02	Vision Transformers for Dense Prediction
DCNAS	47.12	DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation	-
ACNet (ResNet-101)	45.90	Adaptive Context Network for Scene Parsing	-
Swin-S (RPE w/ GAB)	46.41	Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields
Mask2Former (Swin-L-FaPN)	56.4	Masked-attention Mask Transformer for Universal Image Segmentation
OneFormer (InternImage-H, emb_dim=256, multi-scale, 896x896)	60.8	OneFormer: One Transformer to Rule Universal Image Segmentation
ViT-Adapter-L (UperNet, BEiT pretrain)	58.4	Vision Transformer Adapter for Dense Predictions
Light-Ham (VAN-Large, 46M, IN-1k, MS)	51.0	Is Attention Better Than Matrix Decomposition?
OCR (ResNet-101)	45.28	Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation

0 of 94 row(s) selected.