HyperAI

Object Detection On Coco Minival

المقاييس

AP50
AP75
APL
APM
APS
box AP

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

جدول المقارنة
اسم النموذجAP50AP75APLAPMAPSbox AP
resnest-split-attention-networks71.0057.0766.2956.3636.8052.47
bottom-up-object-detection-by-grouping55.143.756.144.021.640.3
mask-r-cnn59.538.9---36.7
feature-pyramid-networks-for-object-detection61.343.352.643.322.939.8
hiera-a-hierarchical-vision-transformer-----55
moat-alternating-mobile-convolution-and-----58.5
a-strong-and-reproducible-object-detector81.571.478.568.550.464.6
bottleneck-transformers-for-visual7154.2---49.5
resnest-split-attention-networks69.5355.4065.8354.6632.6750.91
cascade-r-cnn-delving-into-high-quality61.646.657.446.223.842.7
non-local-neural-networks63.144.5---40.8
end-to-end-object-detection-with-transformers64.747.762.349.523.744.9
xcit-cross-covariance-image-transformers-----48.1
rethinking-and-improving-relative-position-----40.8
swin-transformer-v2-scaling-up-capacity-and-----62.5
deformable-convnets-v2-more-deformable-better-----43.1
gcnet-non-local-networks-meet-squeeze62.44452.544.424.240.3
reppoints-point-set-representation-for-object-----46.8
a-ranking-based-balanced-loss-function58.841.5---39.7
a-ranking-based-balanced-loss-function60.743.3---40.7
deep-residual-learning-for-image-recognition61.947.0---43.5
elsa-enhanced-local-self-attention-for-vision70.556.0---51.6
190807919--62.250.328.847.0
grid-r-cnn58.342.451.543.822.639.6
centermask-real-time-anchor-free-instance-1--58.8-29.245.6
group-normalization61.644.4---40.8
recurrent-glimpse-based-decoder-for-detection67.553.16552.63049.1
rethinking-imagenet-pre-training66.852.9---48.6
sparse-r-cnn-end-to-end-object-detection-with64.649.561.648.328.345.6
19090977755.3----35.6
cornernet-detecting-objects-as-paired53.840.951.840.518.638.4
a-novel-region-of-interest-extraction-layer59.941.749.742.122.938.4
190408900--57.143.523.841.4
eva-exploring-the-limits-of-masked-visual82.170.878.568.449.464.5
pyramid-vision-transformer-a-versatile63.646.159.546.026.143.4
reducing-label-noise-in-anchor-free-object59.544.252.344.725.440.5
efficientdet-scalable-and-efficient-object73.459.067.958.040.0-
reppoints-point-set-representation-for-object-----40.3
end-to-end-object-detection-with-transformers63.947.85648.127.244
houghnet-integrating-near-and-long-range62.246.955.847.625.543.0
context-autoencoder-for-self-supervised-----54.5
conditional-detr-for-fast-training65.448.562.24925.345.1
190408900--58.444.325.542.6
bottleneck-transformers-for-visual71.354.6---49.7
improved-multiscale-vision-transformers-for-----58.7
attentive-normalization66.249.1---44.9
centermask-real-time-anchor-free-instance-167.8----48.6
exploring-plain-vision-transformer-backbones-----60.4
retinamask-learning-to-predict-masks-improves60.244.1---41.1
rethinking-and-improving-relative-position-----42.3
bottom-up-object-detection-by-grouping59.646.859.446.625.743.3
masked-autoencoders-are-scalable-vision-----53.3
rethinking-pre-training-and-self-training-----54.2
mask-r-cnn-----37.7
anchor-detr-query-design-for-transformer65.748.861.649.425.845.1
detrs-with-collaborative-hybrid-assignments-----65.9
dn-detr-accelerate-detr-training-by67.653.865.452.631.349.5
conditional-detr-for-fast-training6445.761.546.722.743
resnest-split-attention-networks68.7855.1763.954.2-50.54
a-ranking-based-balanced-loss-function60.342.3---40.2
dino-detr-with-improved-denoising-anchor-169.15665.854.234.551.3
vit-comer-vision-transformer-with-----64.3
foveabox-beyond-anchor-based-object-detector57.840.5---38.1
moat-alternating-mobile-convolution-and-----57.7
general-object-foundation-model-for-images-----62.0
gradient-harmonized-single-stage-detector55.538.146.739.619.635.8
dynamic-head-unifying-object-detection-heads78.2-74.2--60.3
efficientdet-scalable-and-efficient-object-----52.1
centermask-real-time-anchor-free-instance-1--57.7-28.544.9
virtex-learning-visual-representations-from-----40.9
towards-all-in-one-pre-training-via-----65.0
yolov6-v3-0-a-full-scale-reloading74.5----57.2
pix2seq-a-language-modeling-framework-for-----47.3
grid-r-cnn60.344.454.145.823.441.3
augmenting-convolutional-networks-with-----46.4
a-novel-region-of-interest-extraction-layer59.240.647.841.522.337.5
recursively-refined-r-cnn-instance64.148.458.947.12744.3
internimage-exploring-large-scale-vision-----65.0
metaformer-is-actually-what-you-need-for63.144.8---41.0
centernet-object-detection-with-keypoint59.243.955.843.823.641.3
uniform-masking-enabling-mae-pre-training-for-----57.4
moat-alternating-mobile-convolution-and-----59.2
19080791959.244.954.144.223.741.3
pyramid-vision-transformer-a-versatile63.745.458.446.025.842.6
190807919---45.425.042.3
improved-multiscale-vision-transformers-for-----54.3
deformable-convnets-v2-more-deformable-better--58.745.822.241.7
centermask-real-time-anchor-free-instance-1---48.327.744.6
sparse-r-cnn-end-to-end-object-detection-with62.147.259.746.326.143.5
dino-detr-with-improved-denoising-anchor-16955.865.354.33551.2
scale-aware-trident-networks-for-object63.545.556.94724.942
transnext-robust-foveal-visual-perception-for-----55.7
improved-multiscale-vision-transformers-for-----56.1
conditional-detr-for-fast-training66.849.563.350.327.245.9
pix2seq-a-language-modeling-framework-for-----50.0
augmenting-convolutional-networks-with-----47.0
hybrid-task-cascade-for-instance-segmentation59.440.752.340.920.343.2
xcit-cross-covariance-image-transformers-----48.5
spinenet-learning-scale-permuted-backbone-for-----52.2
end-to-end-semi-supervised-object-detection-----60.1
190807919---47.926.1-
res2net-a-new-multi-scale-backbone53.6-51.138.31433.7
mask-r-cnn-----40.0
pix2seq-a-language-modeling-framework-for-----42.6
general-object-foundation-model-for-images-----55.0
exploring-plain-vision-transformer-backbones-----61.3
feature-selective-anchor-free-module-for55.037.948.239.619.835.9
reppoints-point-set-representation-for-object-----40.8
simple-copy-paste-is-a-strong-data-----54.5
pix2seq-a-language-modeling-framework-for-----47.1
feature-selective-anchor-free-module-for62.4----41.6
moat-alternating-mobile-convolution-and-----55.9
non-local-neural-networks67.848.9---45.0
res2net-a-new-multi-scale-backbone66.551.362.151.628.647.5
when-shift-operation-meets-vision-transformer---42.3--
2103-15358-47.658.14829.944.7
centermask-real-time-anchor-free-instance-1--57.1-26.744.4
houghnet-integrating-near-and-long-range64.650.359.748.830.046.1
19080791958.941.549.640.822.638.0
davit-dual-attention-vision-transformers-----49.9
gcnet-non-local-networks-meet-squeeze66.952.2---47.9
recursively-refined-r-cnn-instance6146.355.745.224.542
19080791962.748.758.548.126.344.6
universal-instance-perception-as-object77.566.775.364.845.160.6
improved-multiscale-vision-transformers-for-----52.7
dino-detr-with-improved-denoising-anchor-1-----63.2
swin-transformer-hierarchical-vision-----57.1
usb-universal-scale-object-detection69.555.465.855.533.550.9
transnext-robust-foveal-visual-perception-for-----57.1
moat-alternating-mobile-convolution-and-----55.2
general-object-foundation-model-for-images-----60.4
cbnetv2-a-composite-backbone-network-----59.6
dynamic-head-unifying-object-detection-heads-----46.5
recursively-refined-r-cnn-instance64.348.959.648.326.644.8
recursively-refined-r-cnn-instance61.245.6--24.4-
usb-universal-scale-object-detection70.858.968.157.536.953.5
19080791962.845.954.644.7-41.8
reppoints-point-set-representation-for-object-----46.4
cp-detr-concept-prompt-guide-detr-toward-----64.1
pvtv2-improved-baselines-with-pyramid-vision69.554.9---50.1
fcos-fully-convolutional-one-stage-object57.441.449.842.522.338.6
global-context-networks70.456.1---51.8
sparse-r-cnn-end-to-end-object-detection-with61.245.757.644.626.742.3
conditional-detr-for-fast-training65.647.563.648.423.644.5
moat-alternating-mobile-convolution-and-----50.5
dab-detr-dynamic-anchor-boxes-are-better-16750.264.150.528.146.6
pix2seq-a-language-modeling-framework-for61.046.158.64726.643.2
swin-transformer-hierarchical-vision-----58
feature-selective-anchor-free-module-for58.0----37.9
sparse-r-cnn-end-to-end-object-detection-with63.448.259.547.226.944.5
simple-training-strategies-and-model-scaling--70.356.233.953.1
revisiting-efficient-object-detection65.552.261.151.930.347.8
focal-self-attention-for-local-global77.2-73.4--58.7
deep-residual-learning-for-image-recognition63.048.3---44.5
190807919--60.1-27.546.0
you-only-learn-one-representation-unified70.657.465.257.337.4-
m2det-a-single-shot-object-detector-based-on53.7-49.339.515.934.1
x-volution-on-the-unification-of-convolution6446.4554626.942.8
vision-transformer-adapter-for-dense-----60.2
foveabox-beyond-anchor-based-object-detector57.840.252.742.219.538
m2det-a-single-shot-object-detector-based-on52.2-49.138.21533.2
group-normalization6144---40.3
moat-alternating-mobile-convolution-and-----53.0
understanding-the-robustness-in-vision-----55.1
rethinking-imagenet-pre-training67.151.1---46.4
2103-1535865.547.158.347.928.944.3
could-giant-pretrained-image-models-extract-----59.3
weight-standardization64.1547.1156.3947.1925.4943.12
reppoints-point-set-representation-for-object-----38.6
anchor-detr-query-design-for-transformer64.747.560.648.224.744.2
190807919--51.041.7-39.2
vision-transformer-adapter-for-dense-----60.5
bottleneck-transformers-for-visual-----45.9
feature-selective-anchor-free-module-for59.2----39.3
focal-modulation-networks-----64.2
scaled-yolov4-scaling-cross-stage-partial73.360.767.459.538.155.4
florence-a-new-foundation-model-for-computer-----62
end-to-end-semi-supervised-object-detection-----60.7
queryinst-parallelly-supervised-mask-query75.861.771.559.840.256.1
reversible-column-networks-----63.8
reppoints-point-set-representation-for-object-----44.5
detrs-with-collaborative-hybrid-assignments-----64.7
focal-modulation-networks70.356.0---51.5
focal-modulation-networks70.155.8----
simple-copy-paste-is-a-strong-data-----57.0
activemlp-an-mlp-like-architecture-with-----52.3
moat-alternating-mobile-convolution-and-----51.9
foveabox-beyond-anchor-based-object-detector58.441.551.743.522.338.9
grounded-language-image-pre-training-----60.8
transnext-robust-foveal-visual-perception-for-----56.6
masked-autoencoders-are-scalable-vision-----50.3
19080791961.844.853.343.724.440.9
group-normalization62.846.2---42.3
19080791961.747.757.446.525.643.7
dynamic-head-unifying-object-detection-heads76.8-73.262.244.558.4
simple-training-strategies-and-model-scaling--70.656.734.553.6
hornet-efficient-high-order-spatial-----59.2
lip-local-importance-based-pooling63.645.6-45.825.241.7
190807919--59.548.427.045.3
grounding-dino-marrying-dino-with-grounded-----63.0
solq-segmenting-objects-by-learning-queries74.961.371.9---
adaptively-connected-neural-networks-----39.5
deep-residual-learning-for-image-recognition64.350.5---46.3
pix2seq-a-language-modeling-framework-for63.248.660.448.928.245.0
usb-universal-scale-object-detection67.052.662.752.730.648.5
dynamic-head-unifying-object-detection-heads--66.3---
cbnetv2-a-composite-backbone-network-----59.1
rethinking-imagenet-pre-training-----47.4
libra-r-cnn-towards-balanced-learning-for59.342.050.542.122.938.5
reppoints-point-set-representation-for-object-----44.8
internimage-exploring-large-scale-vision-----64.2
190807919---46.026.643.1
elsa-enhanced-local-self-attention-for-vision70.452.9---48.3
towards-sustainable-self-supervised-learning-----54.6
cascade-r-cnn-delving-into-high-quality 59.443.754.143.722.940.3
non-local-neural-networks61.141.9---39.0
you-only-learn-one-representation-unified73.560.668.760.140.4-
foveabox-beyond-anchor-based-object-detector55.237.950.539.418.636.0
dab-detr-dynamic-anchor-boxes-are-better-164.747.262.948.224.144.1