Image To Text Retrieval On Coco
Métriques
Recall@1
Recall@10
Recall@5
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | Recall@1 | Recall@10 | Recall@5 |
---|---|---|---|
blip-2-bootstrapping-language-image-pre | 83.5 | 98.0 | 96.0 |
blip-2-bootstrapping-language-image-pre | 85.4 | 98.5 | 97.0 |
deep-visual-semantic-alignments-for | - | 74.8 | - |
one-peace-exploring-one-general | 84.1 | 98.3 | 96.3 |
unicoder-vl-a-universal-encoder-for-vision | - | 97.2 | - |
sigmoid-loss-for-language-image-pre-training | 70.6 | - | - |
oscar-object-semantics-aligned-pre-training | - | 99.8 | - |
flava-a-foundational-language-and-vision | 42.74 | - | 76.76 |
learning-relation-alignment-for-calibrated | 67.78 | 94.48 | 89.7 |
learning-transferable-visual-models-from | 58.4 | 88.1 | 81.5 |