HyperAI

Zero Shot Cross Modal Retrieval On Coco 2014

المقاييس

Image-to-text R@1
Image-to-text R@10
Image-to-text R@5
Text-to-image R@1
Text-to-image R@10
Text-to-image R@5

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

جدول المقارنة
اسم النموذجImage-to-text R@1Image-to-text R@10Image-to-text R@5Text-to-image R@1Text-to-image R@10Text-to-image R@5
coca-contrastive-captioners-are-image-text66.391.886.251.282.074.2
learning-transferable-visual-models-from58.488.181.537.872.262.4
vilt-vision-and-language-transformer-without56.589.682.640.481.170
ernie-vil-2-0-multi-view-contrastive-learning63.191.485.746.080.471.4
align-before-fuse-vision-and-language68.794.789.550.184.576.4
cosmos-cross-modality-self-distillation-for64.392.086.548.482.674.2
position-guided-text-prompt-for-vision69.794.790.049.584.275.9
cosmos-cross-modality-self-distillation-for68.092.587.852.584.977.2
imagebert-cross-modal-pre-training-with-large44.080.471.232.370.259.0
internvl-scaling-up-vision-foundation-models70.693.589.054.184.677.3
flamingo-a-visual-language-model-for-few-shot-165.992.987.348.082.173.3
internvl-scaling-up-vision-foundation-models74.995.291.358.688.081.3
scaling-up-visual-and-vision-language58.689.783.045.678.669.8
النموذج 14000000
florence-a-new-foundation-model-for-computer64.7-85.947.2-71.4
boldsymbol-m-2-encoder-advancing-bilingual72.896.392.356.588.881.6
region-aware-pretraining-for-open-vocabulary68.992.287.851.883.075.0
vision-language-pre-training-with-triple71.495.490.853.587.179.0