Semantic Segmentation On Ade20K Val
Metriken
mIoU
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | mIoU |
---|---|
semask-semantically-masked-transformers-for-1 | 58.2 |
auto-deeplab-hierarchical-neural-architecture | 43.98 |
beit-bert-pre-training-of-image-transformers | 57.0 |
swin-transformer-hierarchical-vision | 53.5 |
eva-exploring-the-limits-of-masked-visual | 61.5 |
mixmim-mixed-and-masked-image-modeling-for | 50.3 |
semask-semantically-masked-transformers-for-1 | 57.0 |
augmenting-convolutional-networks-with | 52.8 |
masked-attention-mask-transformer-for | 57.7 |
twins-revisiting-spatial-attention-design-in | 50.2 |
disentangled-non-local-neural-networks | 45.97 |
vision-transformers-for-dense-prediction | 49.02 |
dcnas-densely-connected-neural-architecture | 47.12 |
adaptive-context-network-for-scene-parsing-1 | 45.90 |
understanding-gaussian-attention-bias-of | 46.41 |
masked-attention-mask-transformer-for | 56.4 |
oneformer-one-transformer-to-rule-universal | 60.8 |
vision-transformer-adapter-for-dense | 58.4 |
is-attention-better-than-matrix-decomposition-1 | 51.0 |
object-contextual-representations-for | 45.28 |
sernet-former-semantic-segmentation-by | 59.35 |
resnest-split-attention-networks | 47.60 |
deit-iii-revenge-of-the-vit | 55.6 |
gswin-gated-mlp-vision-model-with | 47.63 |
elsa-enhanced-local-self-attention-for-vision | 50.3 |
context-prior-for-scene-segmentation | 46.27 |
semask-semantically-masked-transformers-for-1 | 56.2 |
fapn-feature-aligned-pyramid-network-for | 56.7 |
vision-transformer-adapter-for-dense | 60.5 |
Modell 30 | 46.9 |
segformer-simple-and-efficient-design-for | 51.8 |
is-attention-better-than-matrix-decomposition-1 | 49.6 |
segmenter-transformer-for-semantic | 49.61 |
asymmetric-non-local-neural-networks-for | 45.24 |
ctnet-context-based-tandem-network-for | 45.94 |
refinenet-multi-path-refinement-networks-for | 40.70 |
improve-vision-transformers-training-by | 54.4% |
pyramid-scene-parsing-network | 43.29% |
semask-semantically-masked-transformers-for-1 | 53.5 |
oneformer-one-transformer-to-rule-universal | 57.7 |
gswin-gated-mlp-vision-model-with | 49.69 |
rethinking-decoders-for-transformer-based | 52.9 |
unified-perceptual-parsing-for-scene | 42.66 |
high-resolution-representations-for-labeling | 42.99 |
symbolic-graph-reasoning-meets-convolutions | 44.32 |
mask-dino-towards-a-unified-transformer-based-1 | 60.8 |
context-encoding-for-semantic-segmentation | 44.65 |
is-attention-better-than-matrix-decomposition-1 | 51.5 |
pyramidal-convolution-rethinking | 45.99 |
representation-separation-for-semantic | 58.4 |
vit-comer-vision-transformer-with | 62.1 |
multimae-multi-modal-multi-task-masked | 46.2 |
oneformer-one-transformer-to-rule-universal | 58.6 |
augmenting-convolutional-networks-with | 52.9 |
semask-semantically-masked-transformers-for-1 | 57.5 |
object-contextual-representations-for | 45.66 |
segmenter-transformer-for-semantic | 53.63 |
mixmim-mixed-and-masked-image-modeling-for | 53.8 |
rethinking-decoders-for-transformer-based | 54.3 |
shuffle-transformer-rethinking-spatial | 50.5 |
efficient-self-ensemble-framework-for-1 | 57.1 |
crossformer-a-versatile-vision-transformer | 51.4% |
davit-dual-attention-vision-transformers | 46.3 |
object-contextual-representations-for | 47.98 |
psanet-point-wise-spatial-attention-network | 43.77 |
resnest-split-attention-networks | 48.36 |
augmenting-convolutional-networks-with | 49.3 |
davit-dual-attention-vision-transformers | 48.8 |
oneformer-one-transformer-to-rule-universal | 58.3 |
adaptive-context-network-for-scene-parsing-1 | 45.90 |
deit-iii-revenge-of-the-vit | 54.1 |
pyramid-scene-parsing-network | 43.51% |
oneformer-one-transformer-to-rule-universal | 58.4 |
image-as-a-foreign-language-beit-pretraining | 62.8 |
augmenting-convolutional-networks-with | 51.1 |
contrastive-learning-rivals-masked-image | 61.4 |
k-net-towards-unified-image-segmentation | 54.3 |
shuffle-transformer-rethinking-spatial | 49.6 |
per-pixel-classification-is-not-all-you-need | 55.6 |
efficient-self-ensemble-framework-for-1 | 54.2 |
segmenter-transformer-for-semantic | 50.0 |
shuffle-transformer-rethinking-spatial | 47.6 |
focal-self-attention-for-local-global | 55.4 |
resnest-split-attention-networks | 46.91 |
segvit-semantic-segmentation-with-plain | 55.2 |
cswin-transformer-a-general-vision | 55.7 |
gswin-gated-mlp-vision-model-with | 45.07 |
semask-semantically-masked-transformers-for-1 | 58.2 |
beyond-self-attention-external-attention | 45.33 |
dilated-neighborhood-attention-transformer | 58.1 |
refinenet-multi-path-refinement-networks-for | 40.20 |
dynamic-structured-semantic-propagation | 43.68 |
location-aware-upsampling-for-semantic | 45.02 |
swin-transformer-hierarchical-vision | 49.7 |