Semantic Segmentation On Ade20K
평가 지표
GFLOPs
Params (M)
Validation mIoU
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | GFLOPs | Params (M) | Validation mIoU |
---|---|---|---|
internimage-exploring-large-scale-vision | 2526 | 256 | 54.1 |
towards-sustainable-self-supervised-learning | - | - | 51.0 |
efficientvit-enhanced-linear-attention-for | - | - | 49 |
visual-attention-network | - | 49 | 48.1 |
per-pixel-classification-is-not-all-you-need | - | - | 53.8 |
co-occurrent-features-in-semantic | - | - | 44.89 |
a-convnet-for-the-2020s | - | 82 | 49.6 |
segnet-a-deep-convolutional-encoder-decoder | - | - | 21.64 |
towards-all-in-one-pre-training-via | - | 1310 | 62.9 |
hrvit-multi-scale-high-resolution-vision | - | 20.8 | 48.76 |
swin-transformer-v2-scaling-up-capacity-and | - | - | 53.7 |
rethinking-semantic-segmentation-from-a | - | - | 50.28 |
elsa-enhanced-local-self-attention-for-vision | - | - | 50.3 |
sequential-ensembling-for-semantic | - | - | 46.8 |
fastvit-a-fast-hybrid-vision-transformer | - | - | - |
adaptive-context-network-for-scene-parsing-1 | - | - | 45.90 |
symbolic-graph-reasoning-meets-convolutions | - | - | 44.32 |
convmlp-hierarchical-convolutional-mlps-for | - | - | 35.8 |
semask-semantically-masked-transformers-for-1 | - | - | 58.2 |
convmlp-hierarchical-convolutional-mlps-for | - | - | 38.6 |
dat-spatially-dynamic-vision-transformer-with | - | - | 51.2 |
segformer-simple-and-efficient-design-for | - | 84.7 | 51.8 |
unireplknet-a-universal-perception-large | - | - | 49.1 |
fbnetv5-neural-architecture-search-for | - | - | 40.4 |
neighborhood-attention-transformer | - | 58 | 48.4 |
vision-transformer-adapter-for-dense | - | 451 | 58.4 |
hrvit-multi-scale-high-resolution-vision | - | 8.2 | 45.88 |
when-shift-operation-meets-vision-transformer | - | - | 47.9 |
self-supervised-learning-with-swin | - | - | 45.58 |
dilated-neighborhood-attention-transformer | - | - | 50.4 |
conditional-boundary-loss-for-semantic | - | - | 54.9 |
contrastive-learning-rivals-masked-image | - | 3000 | 61.4 |
could-giant-pretrained-image-models-extract | - | - | 57.6 |
vision-transformer-adapter-for-dense | - | 571 | 60.5 |
context-prior-for-scene-segmentation | - | - | 46.27 |
dat-spatially-dynamic-vision-transformer-with | - | - | 50.3 |
location-aware-upsampling-for-semantic | - | - | 44.55 |
resnest-split-attention-networks | - | - | 48.36 |
text-image-alignment-for-diffusion-based | - | - | 55.9 |
when-shift-operation-meets-vision-transformer | - | - | 49.2 |
mask-dino-towards-a-unified-transformer-based-1 | - | 223 | 60.8 |
architecture-agnostic-masked-image-modeling | - | - | 38.3 |
debiformer-vision-transformer-with-deformable | - | - | 52.0 |
fully-convolutional-networks-for-semantic-1 | - | - | 29.39 |
segmenter-transformer-for-semantic | - | - | 50.0 |
segmenter-transformer-for-semantic | - | - | 49.61 |
multimae-multi-modal-multi-task-masked | - | - | 46.2 |
efficient-multi-order-gated-aggregation | - | - | 50.1 |
internimage-exploring-large-scale-vision | 4635 | 1310 | 62.9 |
vision-transformer-adapter-for-dense | - | 571 | 61.5 |
harnessing-diffusion-models-for-visual | - | - | 56.8 |
transnext-robust-foveal-visual-perception-for | - | 109 | 54.7 |
visual-attention-network | - | 8 | 38.5 |
exploring-target-representations-for-masked | - | - | 50.8 |
region-rebalance-for-long-tailed-semantic | - | - | 57.7 |
moat-alternating-mobile-convolution-and | - | 6 | 41.2 |
vision-transformer-with-deformable-attention | - | 60 | 45.54 |
resnest-split-attention-networks | - | - | 47.60 |
pyramid-scene-parsing-network | - | - | 43.29 |
exploring-target-representations-for-masked | - | - | 56.2 |
xcit-cross-covariance-image-transformers | - | - | 47.1 |
a-convnet-for-the-2020s | - | 122 | 53.1 |
moat-alternating-mobile-convolution-and | - | 198 | 56.5 |
xcit-cross-covariance-image-transformers | - | - | 46.9 |
efficient-multi-order-gated-aggregation | - | - | 50.9 |
improve-vision-transformers-training-by | - | - | 54.4 |
representation-separation-for-semantic | - | 330 | 58.4 |
dilated-neighborhood-attention-transformer | - | - | 48.8 |
unified-perceptual-parsing-for-scene | - | - | 42.66 |
neighborhood-attention-transformer | - | 50 | 46.4 |
eva-exploring-the-limits-of-masked-visual | - | 1074 | 62.3 |
masked-autoencoders-are-scalable-vision | - | - | 48.1 |
reversible-column-networks | - | 2439 | 61.0 |
is-attention-better-than-matrix-decomposition-1 | - | 27.4 | 49.6 |
global-context-vision-transformers | - | 58 | 46.5 |
unireplknet-a-universal-perception-large | - | - | 55 |
fastvit-a-fast-hybrid-vision-transformer | - | - | - |
dilated-neighborhood-attention-transformer | - | - | 54.6 |
dcnas-densely-connected-neural-architecture | - | - | 47.12 |
dilated-neighborhood-attention-transformer | - | - | 49.9 |
a-convnet-for-the-2020s | - | 122 | 49.9 |
shuffle-transformer-rethinking-spatial | - | - | 47.6 |
refinenet-multi-path-refinement-networks-for | - | - | 40.7 |
focal-modulation-networks | - | - | 58.5 |
condnet-conditional-classifier-for-scene | - | - | 47.38 |
fastvit-a-fast-hybrid-vision-transformer | - | - | - |
unireplknet-a-universal-perception-large | - | - | 53.9 |
xcit-cross-covariance-image-transformers | - | - | 48.1 |
condnet-conditional-classifier-for-scene | - | - | 47.54 |
augmenting-convolutional-networks-with | - | - | 49.3 |
moat-alternating-mobile-convolution-and | - | 81 | 54.7 |
fastfcn-rethinking-dilated-convolution-in-the | - | - | 44.34 |
disentangled-non-local-neural-networks | - | - | 45.97 |
masked-attention-mask-transformer-for | - | - | 57.7 |
dynamic-focus-aware-positional-queries-for | - | - | 57.7 |
shuffle-transformer-rethinking-spatial | - | - | 50.5 |
pyramid-scene-parsing-network | - | - | 43.51 |
sequential-ensembling-for-semantic | - | 216.3 | 54 |
context-autoencoder-for-self-supervised | - | - | 54.7 |
augmenting-convolutional-networks-with | - | - | 52.9 |
efficient-self-ensemble-framework-for-1 | - | - | 54.2 |
ibot-image-bert-pre-training-with-online | - | - | 45.4 |
internimage-exploring-large-scale-vision | 3142 | 368 | 55.3 |
convnext-v2-co-designing-and-scaling-convnets | - | - | 54.2 |
convnext-v2-co-designing-and-scaling-convnets | - | - | 49.9 |
segformer-simple-and-efficient-design-for | - | 64.1 | 51.1 |
adaptive-context-network-for-scene-parsing-1 | - | - | 45.90 |
vision-transformer-with-deformable-attention | - | 121 | 49.38 |
augmenting-convolutional-networks-with | - | - | 52.8 |
neighborhood-attention-transformer | - | 82 | 49.5 |
unireplknet-a-universal-perception-large | - | - | 52.7 |
conditional-boundary-loss-for-semantic | - | - | 56.1 |
scene-segmentation-with-dual-relation-aware | - | - | 46.18 |
segformer-simple-and-efficient-design-for | - | 3.8 | 37.4 |
k-net-towards-unified-image-segmentation | - | - | 54.3 |
fapn-feature-aligned-pyramid-network-for | - | - | 56.7 |
semask-semantically-masked-transformers-for-1 | - | 56 | 47.63 |
image-as-a-foreign-language-beit-pretraining | - | 1900 | 62.8 |
pyramidal-convolution-rethinking | - | - | 45.99 |
volo-vision-outlooker-for-visual-recognition | - | - | 54.3 |
transnext-robust-foveal-visual-perception-for | - | 47.5 | 53.4 |
xcit-cross-covariance-image-transformers | - | - | 48.4 |
generalized-parametric-contrastive-learning | - | - | 54.3 |
masked-attention-mask-transformer-for | - | - | 56.4 |
davit-dual-attention-vision-transformers | - | - | 46.3 |
unireplknet-a-universal-perception-large | - | - | 55.6 |
muxconv-information-multiplexing-in | - | - | 35.8 |
global-context-vision-transformers | - | 84 | 48.3 |
beyond-self-attention-external-attention | - | - | 45.33 |
moat-alternating-mobile-convolution-and | - | 24 | 47.5 |
biformer-vision-transformer-with-bi-level | - | - | 50.8 |
unireplknet-a-universal-perception-large | - | - | 51 |
visual-attention-network | - | 18 | 42.9 |
a-convnet-for-the-2020s | - | 60 | 46.7 |
hornet-efficient-high-order-spatial | - | - | 57.9 |
is-attention-better-than-matrix-decomposition-1 | - | 13.8 | 45.2 |
semask-semantically-masked-transformers-for-1 | - | - | 57.0 |
activemlp-an-mlp-like-architecture-with | - | 108 | 51.1 |
convnext-v2-co-designing-and-scaling-convnets | - | - | 53.5 |
convnext-v2-co-designing-and-scaling-convnets | - | - | 51.6 |
segvitv2-exploring-efficient-and-continual | - | - | 58.2 |
semask-semantically-masked-transformers-for-1 | - | - | 57.5 |
token-labeling-training-a-85-5-top-1-accuracy | - | 209 | 51.8 |
convmlp-hierarchical-convolutional-mlps-for | - | - | 40 |
dynamic-structured-semantic-propagation | - | - | 43.68 |
moat-alternating-mobile-convolution-and | - | 8 | 43.1 |
efficient-multi-order-gated-aggregation | - | - | 47.7 |
ddp-diffusion-model-for-dense-visual | - | 207 | 54.4 |
semask-semantically-masked-transformers-for-1 | - | 96 | 50.98 |
dilated-neighborhood-attention-transformer | - | - | 54.9 |
object-contextual-representations-for | - | - | 45.28 |
vision-transformer-with-deformable-attention | - | 81 | 48.31 |
psanet-point-wise-spatial-attention-network | - | - | 43.77 |
masked-autoencoders-are-scalable-vision | - | - | 53.6 |
metaformer-is-actually-what-you-need-for | - | - | 42.7 |
multi-scale-context-aggregation-by-dilated | - | - | 32.31 |
twins-revisiting-spatial-attention-design-in | - | - | 50.2 |
object-contextual-representations-for | - | - | 47.98 |
swin-transformer-hierarchical-vision | - | - | 49.7 |
sernet-former-semantic-segmentation-by | - | - | 59.35 |
neighborhood-attention-transformer | - | 123 | 49.7 |
when-shift-operation-meets-vision-transformer | - | - | 46.3 |
ibot-image-bert-pre-training-with-online | - | - | 38.3 |
efficient-self-ensemble-framework-for-1 | - | - | 57.1 |
internimage-exploring-large-scale-vision | 1017 | 80 | 50.9 |
semask-semantically-masked-transformers-for-1 | - | - | 53.52 |
crossformer-a-versatile-vision-transformer | - | - | 51.4 |
is-attention-better-than-matrix-decomposition-1 | - | 61.1 | 51.5 |
visual-attention-network | - | - | 54.7 |
asymmetric-non-local-neural-networks-for | - | - | 45.24 |
pyramid-scene-parsing-network | - | - | 44.94 |
convnext-v2-co-designing-and-scaling-convnets | - | - | 50.5 |
transnext-robust-foveal-visual-perception-for | - | 69 | 54.1 |
dat-spatially-dynamic-vision-transformer-with | - | - | 51.5 |
xcit-cross-covariance-image-transformers | - | - | 44.2 |
biformer-vision-transformer-with-bi-level | - | - | 51.7 |
high-resolution-representations-for-labeling | - | - | 43.2 |
when-shift-operation-meets-vision-transformer | - | - | 47.8 |
augmenting-convolutional-networks-with | - | - | 51.1 |
a-convnet-for-the-2020s | - | 235 | 53.7 |
semask-semantically-masked-transformers-for-1 | - | 35 | 43.16 |
visual-attention-network | - | - | 46.7 |
efficient-multi-order-gated-aggregation | - | - | 49.2 |
object-contextual-representations-for | - | - | 45.66 |
swin-transformer-hierarchical-vision | - | - | 53.50 |
focal-self-attention-for-local-global | - | - | 55.40 |
ibot-image-bert-pre-training-with-online | - | - | 50.0 |
one-peace-exploring-one-general | - | 1500 | 63.0 |
per-pixel-classification-is-not-all-you-need | - | - | 48.1 |
moat-alternating-mobile-convolution-and | - | 496 | 57.6 |
convnext-v2-co-designing-and-scaling-convnets | - | - | 52.1 |
semask-semantically-masked-transformers-for-1 | - | - | 56.2 |
resnest-split-attention-networks | - | - | 46.91 |
is-attention-better-than-matrix-decomposition-1 | - | 45.6 | 51.0 |
dilated-neighborhood-attention-transformer | - | - | 58.1 |
swin-transformer-v2-scaling-up-capacity-and | - | - | 59.9 |
cswin-transformer-a-general-vision | - | - | 55.70 |
internimage-exploring-large-scale-vision | - | 1310 | - |
hrvit-multi-scale-high-resolution-vision | - | 28.7 | 50.2 |
auto-deeplab-hierarchical-neural-architecture | - | - | 43.98 |
colormae-exploring-data-independent-masking | - | - | 49.3 |
fastvit-a-fast-hybrid-vision-transformer | - | - | - |
convnext-v2-co-designing-and-scaling-convnets | - | - | 53.7 |
semask-semantically-masked-transformers-for-1 | - | - | 58.2 |
convnext-v2-co-designing-and-scaling-convnets | - | - | 52.8 |
vision-transformers-for-dense-prediction | - | - | 49.02 |
exploring-target-representations-for-masked | - | - | 55.2 |
masked-attention-mask-transformer-for | - | - | 57.3 |
is-attention-better-than-matrix-decomposition-1 | - | - | 46.8 |
davit-dual-attention-vision-transformers | - | - | 49.4 |
internimage-exploring-large-scale-vision | 1185 | 128 | 51.3 |
masked-attention-mask-transformer-for | - | - | 55.1 |
2003-13328 | - | - | 45.6 |
internimage-exploring-large-scale-vision | 944 | 59 | 48.1 |
context-encoding-for-semantic-segmentation | - | - | 44.65 |
a-convnet-for-the-2020s | - | 391 | 54 |
global-context-vision-transformers | - | 125 | 49 |
dinov2-learning-robust-visual-features | - | 1080 | 60.2 |
xcit-cross-covariance-image-transformers | - | - | 46.6 |
visual-attention-network | - | 55 | 50.2 |
location-aware-upsampling-for-semantic | - | - | 45.02 |
convnext-v2-co-designing-and-scaling-convnets | - | - | 55 |
architecture-agnostic-masked-image-modeling | - | - | 49 |
dilated-neighborhood-attention-transformer | - | - | 47.2 |
moat-alternating-mobile-convolution-and | - | 13 | 44.9 |
beit-bert-pre-training-of-image-transformers | - | - | 57.0 |
exploring-target-representations-for-masked | - | - | 52.9 |
muxconv-information-multiplexing-in | - | - | 32.42 |
segmenter-transformer-for-semantic | - | - | 53.63 |
efficient-multi-order-gated-aggregation | - | - | 54 |