Panoptic-DeepLab (SWideRNet-(1, 1, 4.5), multi-scale) | 44.8 | 51.9 | 39.3 | 60.0 | Scaling Wide Residual Networks for Panoptic Segmentation | - |
Panoptic FCN* (ResNet-FPN) | 36.9 | - | 32.9 | - | Fully Convolutional Networks for Panoptic Segmentation | |
OneFormer (DiNAT-L, single-scale) | 46.7 | 54.9 | 40.5 | 61.7 | OneFormer: One Transformer to Rule Universal Image Segmentation | |
Panoptic FCN* (ResNet-50-FPN) | - | 42.3 | - | - | Fully Convolutional Networks for Panoptic Segmentation | |
AdaptIS (ResNeXt-101) | 40.3 | - | - | 56.8 | AdaptIS: Adaptive Instance Selection Network | |
Axial-DeepLab-L (multi-scale) | 41.1 | 51.3 | 33.4 | 58.4 | Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation | |
HRNet-OCR (Hierarchical Multi-Scale Attention) | 17.6 | - | - | - | Hierarchical Multi-Scale Attention for Semantic Segmentation | |
Panoptic FCN* (Swin-L, single-scale) | 45.7 | 52.1 | 40.8 | - | Fully Convolutional Networks for Panoptic Segmentation | |
OneFormer (ConvNeXt-L, single-scale) | 46.4 | 54.0 | 40.6 | 61.6 | OneFormer: One Transformer to Rule Universal Image Segmentation | |
Mask2Former + Intra-Batch Supervision (ResNet-50) | 42.2 | 52.0 | 34.9 | - | Intra-Batch Supervision for Panoptic Segmentation on High-Resolution Images | |