Image To Text Retrieval On Flickr30K
Metrics
Recall@1
Recall@10
Recall@5
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | Recall@1 | Recall@10 | Recall@5 |
---|---|---|---|
internvl-scaling-up-vision-foundation-models | 97.9 | 100 | 100 |
hada-a-graph-based-amalgamation-framework-in | 87.3 | 99.2 | 98 |
blip-2-bootstrapping-language-image-pre | 96.9 | 100 | 100 |
hada-a-graph-based-amalgamation-framework-in | 92.6 | 99.9 | 99.3 |
align-before-fuse-vision-and-language | 95.9 | 100.0 | 99.8 |
internvl-scaling-up-vision-foundation-models | 97.2 | 100 | 100 |
a-deep-local-and-global-scene-graph-matching | 71 | 96.1 | 91.9 |
a-deep-local-and-global-scene-graph-matching | 76.4 | 97.3 | 94.3 |
blip-2-bootstrapping-language-image-pre | 97.6 | 100 | 100 |
one-peace-exploring-one-general | 97.6 | 100 | 100 |
ernie-vil-2-0-multi-view-contrastive-learning | 96.1 | 100.0 | 99.9 |