HyperAI

Panoptic Segmentation On Ade20K Val

Metrics

AP
PQ
mIoU

Results

Performance results of various models on this benchmark

Model Name
AP
PQ
mIoU
Paper TitleRepository
Mask2Former (Swin-L)34.248.154.5Masked-attention Mask Transformer for Universal Image Segmentation
OneFormer (ConvNeXt-L, single-scale, 640x640)36.250.056.6OneFormer: One Transformer to Rule Universal Image Segmentation
OpenSeed(SwinL, single scale, 1280x1280)-53.7-A Simple Framework for Open-Vocabulary Segmentation and Detection
OneFormer (DiNAT-L, single-scale, 640x640)36.050.558.3OneFormer: One Transformer to Rule Universal Image Segmentation
X-Decoder (Davit-d5, Deform, single-scale, 1280x1280)38.752.459.1Generalized Decoding for Pixel, Image, and Language-
DiNAT-L (Mask2Former, 640x640)35.049.456.3Dilated Neighborhood Attention Transformer
X-Decoder (L)35.849.658.1Generalized Decoding for Pixel, Image, and Language-
Mask2Former (ResNet-50, 640x640)26.5-46.1Masked-attention Mask Transformer for Universal Image Segmentation
Mask2Former (ResNet-50, 640x640)-39.7-Masked-attention Mask Transformer for Universal Image Segmentation
kMaX-DeepLab (ResNet50, single-scale, 1281x1281)-42.345.3kMaX-DeepLab: k-means Mask Transformer
kMaX-DeepLab (ConvNeXt-L, single-scale, 1281x1281)-50.955.2kMaX-DeepLab: k-means Mask Transformer
Mask2Former (Swin-L + FAPN, 640x640)33.246.255.4Masked-attention Mask Transformer for Universal Image Segmentation
kMaX-DeepLab (ConvNeXt-L, single-scale, 641x641)-48.754.8kMaX-DeepLab: k-means Mask Transformer
OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)40.254.560.4OneFormer: One Transformer to Rule Universal Image Segmentation
OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)-53.458.9OneFormer: One Transformer to Rule Universal Image Segmentation
MaskFormer (R101 + 6 Enc)-35.7-Per-Pixel Classification is Not All You Need for Semantic Segmentation
OneFormer (ConvNeXt-XL, single-scale, 640x640)36.350.157.4OneFormer: One Transformer to Rule Universal Image Segmentation
OneFormer (DiNAT-L, single-scale, 1280x1280)37.151.558.3OneFormer: One Transformer to Rule Universal Image Segmentation
OneFormer (Swin-L, single-scale, 1280x1280)37.851.457.0OneFormer: One Transformer to Rule Universal Image Segmentation
kMaX-DeepLab (ResNet50, single-scale, 641x641)-41.545.0kMaX-DeepLab: k-means Mask Transformer
0 of 22 row(s) selected.