HyperAI

Semantic Segmentation On Ade20K

Métriques

GFLOPs
Params (M)
Validation mIoU

Résultats

Résultats de performance de divers modèles sur ce benchmark

Tableau comparatif
Nom du modèleGFLOPsParams (M)Validation mIoU
internimage-exploring-large-scale-vision252625654.1
towards-sustainable-self-supervised-learning--51.0
efficientvit-enhanced-linear-attention-for--49
visual-attention-network-4948.1
per-pixel-classification-is-not-all-you-need--53.8
co-occurrent-features-in-semantic--44.89
a-convnet-for-the-2020s-8249.6
segnet-a-deep-convolutional-encoder-decoder--21.64
towards-all-in-one-pre-training-via-131062.9
hrvit-multi-scale-high-resolution-vision-20.848.76
swin-transformer-v2-scaling-up-capacity-and--53.7
rethinking-semantic-segmentation-from-a--50.28
elsa-enhanced-local-self-attention-for-vision--50.3
sequential-ensembling-for-semantic--46.8
fastvit-a-fast-hybrid-vision-transformer---
adaptive-context-network-for-scene-parsing-1--45.90
symbolic-graph-reasoning-meets-convolutions--44.32
convmlp-hierarchical-convolutional-mlps-for--35.8
semask-semantically-masked-transformers-for-1--58.2
convmlp-hierarchical-convolutional-mlps-for--38.6
dat-spatially-dynamic-vision-transformer-with--51.2
segformer-simple-and-efficient-design-for-84.751.8
unireplknet-a-universal-perception-large--49.1
fbnetv5-neural-architecture-search-for--40.4
neighborhood-attention-transformer-5848.4
vision-transformer-adapter-for-dense-45158.4
hrvit-multi-scale-high-resolution-vision-8.245.88
when-shift-operation-meets-vision-transformer--47.9
self-supervised-learning-with-swin--45.58
dilated-neighborhood-attention-transformer--50.4
conditional-boundary-loss-for-semantic--54.9
contrastive-learning-rivals-masked-image-300061.4
could-giant-pretrained-image-models-extract--57.6
vision-transformer-adapter-for-dense-57160.5
context-prior-for-scene-segmentation--46.27
dat-spatially-dynamic-vision-transformer-with--50.3
location-aware-upsampling-for-semantic--44.55
resnest-split-attention-networks--48.36
text-image-alignment-for-diffusion-based--55.9
when-shift-operation-meets-vision-transformer--49.2
mask-dino-towards-a-unified-transformer-based-1-22360.8
architecture-agnostic-masked-image-modeling--38.3
debiformer-vision-transformer-with-deformable--52.0
fully-convolutional-networks-for-semantic-1--29.39
segmenter-transformer-for-semantic--50.0
segmenter-transformer-for-semantic--49.61
multimae-multi-modal-multi-task-masked--46.2
efficient-multi-order-gated-aggregation--50.1
internimage-exploring-large-scale-vision4635131062.9
vision-transformer-adapter-for-dense-57161.5
harnessing-diffusion-models-for-visual--56.8
transnext-robust-foveal-visual-perception-for-10954.7
visual-attention-network-838.5
exploring-target-representations-for-masked--50.8
region-rebalance-for-long-tailed-semantic--57.7
moat-alternating-mobile-convolution-and-641.2
vision-transformer-with-deformable-attention-6045.54
resnest-split-attention-networks--47.60
pyramid-scene-parsing-network--43.29
exploring-target-representations-for-masked--56.2
xcit-cross-covariance-image-transformers--47.1
a-convnet-for-the-2020s-12253.1
moat-alternating-mobile-convolution-and-19856.5
xcit-cross-covariance-image-transformers--46.9
efficient-multi-order-gated-aggregation--50.9
improve-vision-transformers-training-by--54.4
representation-separation-for-semantic-33058.4
dilated-neighborhood-attention-transformer--48.8
unified-perceptual-parsing-for-scene--42.66
neighborhood-attention-transformer-5046.4
eva-exploring-the-limits-of-masked-visual-107462.3
masked-autoencoders-are-scalable-vision--48.1
reversible-column-networks-243961.0
is-attention-better-than-matrix-decomposition-1-27.449.6
global-context-vision-transformers-5846.5
unireplknet-a-universal-perception-large--55
fastvit-a-fast-hybrid-vision-transformer---
dilated-neighborhood-attention-transformer--54.6
dcnas-densely-connected-neural-architecture--47.12
dilated-neighborhood-attention-transformer--49.9
a-convnet-for-the-2020s-12249.9
shuffle-transformer-rethinking-spatial--47.6
refinenet-multi-path-refinement-networks-for--40.7
focal-modulation-networks--58.5
condnet-conditional-classifier-for-scene--47.38
fastvit-a-fast-hybrid-vision-transformer---
unireplknet-a-universal-perception-large--53.9
xcit-cross-covariance-image-transformers--48.1
condnet-conditional-classifier-for-scene--47.54
augmenting-convolutional-networks-with--49.3
moat-alternating-mobile-convolution-and-8154.7
fastfcn-rethinking-dilated-convolution-in-the--44.34
disentangled-non-local-neural-networks--45.97
masked-attention-mask-transformer-for--57.7
dynamic-focus-aware-positional-queries-for--57.7
shuffle-transformer-rethinking-spatial--50.5
pyramid-scene-parsing-network--43.51
sequential-ensembling-for-semantic-216.354
context-autoencoder-for-self-supervised--54.7
augmenting-convolutional-networks-with--52.9
efficient-self-ensemble-framework-for-1--54.2
ibot-image-bert-pre-training-with-online--45.4
internimage-exploring-large-scale-vision314236855.3
convnext-v2-co-designing-and-scaling-convnets--54.2
convnext-v2-co-designing-and-scaling-convnets--49.9
segformer-simple-and-efficient-design-for-64.151.1
adaptive-context-network-for-scene-parsing-1--45.90
vision-transformer-with-deformable-attention-12149.38
augmenting-convolutional-networks-with--52.8
neighborhood-attention-transformer-8249.5
unireplknet-a-universal-perception-large--52.7
conditional-boundary-loss-for-semantic--56.1
scene-segmentation-with-dual-relation-aware--46.18
segformer-simple-and-efficient-design-for-3.837.4
k-net-towards-unified-image-segmentation--54.3
fapn-feature-aligned-pyramid-network-for--56.7
semask-semantically-masked-transformers-for-1-5647.63
image-as-a-foreign-language-beit-pretraining-190062.8
pyramidal-convolution-rethinking--45.99
volo-vision-outlooker-for-visual-recognition--54.3
transnext-robust-foveal-visual-perception-for-47.553.4
xcit-cross-covariance-image-transformers--48.4
generalized-parametric-contrastive-learning--54.3
masked-attention-mask-transformer-for--56.4
davit-dual-attention-vision-transformers--46.3
unireplknet-a-universal-perception-large--55.6
muxconv-information-multiplexing-in--35.8
global-context-vision-transformers-8448.3
beyond-self-attention-external-attention--45.33
moat-alternating-mobile-convolution-and-2447.5
biformer-vision-transformer-with-bi-level--50.8
unireplknet-a-universal-perception-large--51
visual-attention-network-1842.9
a-convnet-for-the-2020s-6046.7
hornet-efficient-high-order-spatial--57.9
is-attention-better-than-matrix-decomposition-1-13.845.2
semask-semantically-masked-transformers-for-1--57.0
activemlp-an-mlp-like-architecture-with-10851.1
convnext-v2-co-designing-and-scaling-convnets--53.5
convnext-v2-co-designing-and-scaling-convnets--51.6
segvitv2-exploring-efficient-and-continual--58.2
semask-semantically-masked-transformers-for-1--57.5
token-labeling-training-a-85-5-top-1-accuracy-20951.8
convmlp-hierarchical-convolutional-mlps-for--40
dynamic-structured-semantic-propagation--43.68
moat-alternating-mobile-convolution-and-843.1
efficient-multi-order-gated-aggregation--47.7
ddp-diffusion-model-for-dense-visual-20754.4
semask-semantically-masked-transformers-for-1-9650.98
dilated-neighborhood-attention-transformer--54.9
object-contextual-representations-for--45.28
vision-transformer-with-deformable-attention-8148.31
psanet-point-wise-spatial-attention-network--43.77
masked-autoencoders-are-scalable-vision--53.6
metaformer-is-actually-what-you-need-for--42.7
multi-scale-context-aggregation-by-dilated--32.31
twins-revisiting-spatial-attention-design-in--50.2
object-contextual-representations-for--47.98
swin-transformer-hierarchical-vision--49.7
sernet-former-semantic-segmentation-by--59.35
neighborhood-attention-transformer-12349.7
when-shift-operation-meets-vision-transformer--46.3
ibot-image-bert-pre-training-with-online--38.3
efficient-self-ensemble-framework-for-1--57.1
internimage-exploring-large-scale-vision10178050.9
semask-semantically-masked-transformers-for-1--53.52
crossformer-a-versatile-vision-transformer--51.4
is-attention-better-than-matrix-decomposition-1-61.151.5
visual-attention-network--54.7
asymmetric-non-local-neural-networks-for--45.24
pyramid-scene-parsing-network--44.94
convnext-v2-co-designing-and-scaling-convnets--50.5
transnext-robust-foveal-visual-perception-for-6954.1
dat-spatially-dynamic-vision-transformer-with--51.5
xcit-cross-covariance-image-transformers--44.2
biformer-vision-transformer-with-bi-level--51.7
high-resolution-representations-for-labeling--43.2
when-shift-operation-meets-vision-transformer--47.8
augmenting-convolutional-networks-with--51.1
a-convnet-for-the-2020s-23553.7
semask-semantically-masked-transformers-for-1-3543.16
visual-attention-network--46.7
efficient-multi-order-gated-aggregation--49.2
object-contextual-representations-for--45.66
swin-transformer-hierarchical-vision--53.50
focal-self-attention-for-local-global--55.40
ibot-image-bert-pre-training-with-online--50.0
one-peace-exploring-one-general-150063.0
per-pixel-classification-is-not-all-you-need--48.1
moat-alternating-mobile-convolution-and-49657.6
convnext-v2-co-designing-and-scaling-convnets--52.1
semask-semantically-masked-transformers-for-1--56.2
resnest-split-attention-networks--46.91
is-attention-better-than-matrix-decomposition-1-45.651.0
dilated-neighborhood-attention-transformer--58.1
swin-transformer-v2-scaling-up-capacity-and--59.9
cswin-transformer-a-general-vision--55.70
internimage-exploring-large-scale-vision-1310-
hrvit-multi-scale-high-resolution-vision-28.750.2
auto-deeplab-hierarchical-neural-architecture--43.98
colormae-exploring-data-independent-masking--49.3
fastvit-a-fast-hybrid-vision-transformer---
convnext-v2-co-designing-and-scaling-convnets--53.7
semask-semantically-masked-transformers-for-1--58.2
convnext-v2-co-designing-and-scaling-convnets--52.8
vision-transformers-for-dense-prediction--49.02
exploring-target-representations-for-masked--55.2
masked-attention-mask-transformer-for--57.3
is-attention-better-than-matrix-decomposition-1--46.8
davit-dual-attention-vision-transformers--49.4
internimage-exploring-large-scale-vision118512851.3
masked-attention-mask-transformer-for--55.1
2003-13328--45.6
internimage-exploring-large-scale-vision9445948.1
context-encoding-for-semantic-segmentation--44.65
a-convnet-for-the-2020s-39154
global-context-vision-transformers-12549
dinov2-learning-robust-visual-features-108060.2
xcit-cross-covariance-image-transformers--46.6
visual-attention-network-5550.2
location-aware-upsampling-for-semantic--45.02
convnext-v2-co-designing-and-scaling-convnets--55
architecture-agnostic-masked-image-modeling--49
dilated-neighborhood-attention-transformer--47.2
moat-alternating-mobile-convolution-and-1344.9
beit-bert-pre-training-of-image-transformers--57.0
exploring-target-representations-for-masked--52.9
muxconv-information-multiplexing-in--32.42
segmenter-transformer-for-semantic--53.63
efficient-multi-order-gated-aggregation--54