HyperAI

Image To Text Retrieval On Flickr30K

Metrics

Recall@1
Recall@10
Recall@5

Results

Performance results of various models on this benchmark

Comparison Table
Model NameRecall@1Recall@10Recall@5
internvl-scaling-up-vision-foundation-models97.9100100
hada-a-graph-based-amalgamation-framework-in87.399.298
blip-2-bootstrapping-language-image-pre96.9100100
hada-a-graph-based-amalgamation-framework-in92.699.999.3
align-before-fuse-vision-and-language95.9100.099.8
internvl-scaling-up-vision-foundation-models97.2100100
a-deep-local-and-global-scene-graph-matching7196.191.9
a-deep-local-and-global-scene-graph-matching76.497.394.3
blip-2-bootstrapping-language-image-pre97.6100100
one-peace-exploring-one-general97.6100100
ernie-vil-2-0-multi-view-contrastive-learning96.1100.099.9