ViT-Adapter-L (Mask2Former, BEiT pretrain) | 68.2 | Vision Transformer Adapter for Dense Predictions | |
LaU-regression-loss (ResNet-101) | 53.9 | Location-aware Upsampling for Semantic Segmentation | |
CAA + CAR (ConvNeXt-Large + JPU) | 64.1 | CAR: Class-aware Regularizations for Semantic Segmentation | |
HRNetV2 + OCR + RMI (PaddleClas pretrained) | 59.6 | Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation | |