HyperAI

Cross Modal Retrieval On Coco 2014

المقاييس

Image-to-text R@1
Image-to-text R@10
Image-to-text R@5
Text-to-image R@1
Text-to-image R@10
Text-to-image R@5

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

جدول المقارنة
اسم النموذجImage-to-text R@1Image-to-text R@10Image-to-text R@5Text-to-image R@1Text-to-image R@10Text-to-image R@5
polysemous-visual-semantic-embedding-for-145.284.574.332.475.063.0
3shnet-boosting-image-sentence-retrieval-via67.995.490.550.387.779.3
toward-building-general-foundation-models-for84.298.496.467.092.487.2
an-empirical-study-of-training-end-to-end76.1696.8293.1657.0890.0782.66
similarity-reasoning-and-filtration-for-image57.891.684.941.981.370.7
lile-look-in-depth-before-looking-elsewhere-a55.691.082.441.582.272.1
vilt-vision-and-language-transformer-without61.592.786.342.783.172.9
position-guided-text-prompt-for-vision81.597.995.964.992.287.4
deep-visual-semantic-alignments-for41.281.170.525.366.453.4
dissecting-deep-metric-learning-losses-for81.497.995.663.691.586.0
imram-iterative-matching-with-recurrent53.791.083.239.779.869.1
visual-semantic-reasoning-for-image-text53.089.481.140.581.170.6
florence-a-new-foundation-model-for-computer81.8-95.263.2-85.7
dynamic-self-adaptive-multiscale-distillation48.084.575.662.192.085.9
align-before-fuse-vision-and-language77.697.294.360.790.584.3
x-2-vlm-all-in-one-pre-trained-model-for83.598.596.366.292.287.1
omnivl-one-foundation-model-for-image82.198.195.964.891.686.1
multi-grained-vision-language-pre-training81.298.295.663.491.585.8
x-2-vlm-all-in-one-pre-trained-model-for84.498.596.567.792.587.5
ernie-vil-2-0-multi-view-contrastive-learning77.497.193.659.590.183.4
oscar-object-semantics-aligned-pre-training73.596.092.257.589.882.8
visualsparta-sparse-transformer-fragment---44.482.472.8
vast-a-vision-audio-subtitle-text-omni-1---68.092.887.7
plug-and-play-regulators-for-image-text61.392.686.144.383.273.2
implicit-differentiable-outlier-detection80.796.895.162.992.884.8
vision-language-pre-training-with-triple75.696.792.859.089.983.2
napreg-nouns-as-proxies-regularization-for59.8--43.0--
النموذج 2880.797.895.362.89184.8
vista-vision-and-scene-text-aggregation-for68.995.490.152.687.679.6
aladin-distilling-fine-grained-alignment64.994.588.651.387.579.2
stacked-cross-attention-for-image-text50.490.082.238.680.469.3
learning-semantic-concepts-and-order-for42.883.072.333.175.562.9
scaling-up-visual-and-vision-language7796.993.559.989.883.3
image-as-a-foreign-language-beit-pretraining84.898.396.567.287.792.8
mammut-a-simple-architecture-for-joint70.793.789.1---
valor-vision-audio-language-omni-perception---61.490.984.4