Zero Shot Cross Modal Retrieval On Flickr30K
평가 지표
Image-to-text R@1
Image-to-text R@10
Image-to-text R@5
Text-to-image R@1
Text-to-image R@10
Text-to-image R@5
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | Image-to-text R@1 | Image-to-text R@10 | Image-to-text R@5 | Text-to-image R@1 | Text-to-image R@10 | Text-to-image R@5 |
---|---|---|---|---|---|---|
cosmos-cross-modality-self-distillation-for | 89.9 | 99.3 | 98.8 | 76.1 | 96.2 | 92.8 |
reproducible-scaling-laws-for-contrastive | - | - | 99.3 | - | - | 94.1 |
vilt-vision-and-language-transformer-without | 73.2 | 96.5 | 93.6 | 55 | 89.8 | 82.5 |
ernie-vil-2-0-multi-view-contrastive-learning | 91.2 | 99.8 | 99.1 | 77.4 | 96.4 | 93.8 |
align-before-fuse-vision-and-language | 90.5 | 99.7 | 98.8 | 76.8 | 96.7 | 93.7 |
scaling-up-visual-and-vision-language | 88.6 | 99.7 | 98.7 | 75.7 | 96.8 | 93.8 |
altclip-altering-the-language-encoder-in-clip | 86 | 99.1 | 98 | 72.5 | 95.4 | 91.6 |
internvl-scaling-up-vision-foundation-models | 95.7 | 99.9 | 99.7 | 85.0 | 98.6 | 97.0 |
cosmos-cross-modality-self-distillation-for | 92.9 | 99.9 | 99.4 | 80.3 | 97.6 | 95.3 |
implicit-differentiable-outlier-detection | 89.0 | 99.8 | 99.2 | 77.2 | 98.2 | 94.3 |
coca-contrastive-captioners-are-image-text | 92.5 | 99.9 | 99.5 | 80.4 | 97.7 | 95.7 |
internvl-scaling-up-vision-foundation-models | 94.7 | 99.9 | 99.6 | 81.7 | 98.2 | 96.0 |
image-as-a-foreign-language-beit-pretraining | 94.9 | 100.0 | 99.9 | 81.5 | 97.8 | 95.6 |
flamingo-a-visual-language-model-for-few-shot-1 | 89.3 | 99.7 | 98.8 | 79.5 | 97.9 | 95.3 |
position-guided-text-prompt-for-vision | 87.1 | 99.3 | 98.4 | 73.1 | 94.8 | 91.0 |
florence-a-new-foundation-model-for-computer | 90.9 | - | 99.1 | 76.7 | - | 93.6 |
boldsymbol-m-2-encoder-advancing-bilingual | 91.2 | 99.6 | 99.2 | 92.2 | 99.7 | 99.5 |
imagebert-cross-modal-pre-training-with-large | 70.7 | 94.0 | 90.2 | 54.3 | 87.5 | 79.6 |
vast-a-vision-audio-subtitle-text-omni-1 | - | - | - | 90.4 | - | - |
uniter-learning-universal-image-text-1 | 80.7 | 98.0 | 95.7 | 66.2 | 92.9 | 88.4 |
learning-transferable-visual-models-from | 88.0 | 99.4 | 98.7 | 68.7 | 95.2 | 90.6 |
region-aware-pretraining-for-open-vocabulary | 92.1 | 99.7 | 99.4 | 80.7 | 97.7 | 96.1 |