Image Retrieval On Flickr30K
评估指标
Recall@1
Recall@10
Recall@5
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Recall@1 | Recall@10 | Recall@5 | Paper Title | Repository |
---|---|---|---|---|---|
HADA | 81.36 | 98.02 | 95.94 | HADA: A Graph-based Amalgamation Framework in Image-text Retrieval | |
LGSGM | 57.4 | 90.2 | 84.1 | A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval | |
BLIP-2 ViT-L (zero-shot, 1K test set) | 88.6 | 98.9 | 97.6 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | |
MaMMUT (ours) | 82.5 | 98 | 96 | MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks | |
VisualSparta | 57.4 | 88.1 | 82.0 | VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words | |
ALBEF | 79.76 | 97.72 | 95.3 | HADA: A Graph-based Amalgamation Framework in Image-text Retrieval | |
UNITER | 75.56 | 96.76 | 94.08 | HADA: A Graph-based Amalgamation Framework in Image-text Retrieval | |
BLIP-2 ViT-G (zero-shot, 1K test set) | 89.7 | 98.9 | 98.1 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | |
GSMN | - | 89 | 82.3 | A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval |
0 of 9 row(s) selected.