HyperAI超神经

Image Retrieval On Flickr30K 1K Test

评估指标

R@1
R@10
R@5

评测结果

各个模型在此基准测试上的表现结果

模型名称
R@1
R@10
R@5
Paper TitleRepository
TERAN MrSw56.588.281.2Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders
DAN39.479.169.2Dual Attention Networks for Multimodal Reasoning and Matching
VisualSparta57.488.182.0VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words
2WayNet (VGG)36.0--Linking Image and Text with 2-Way Nets
DVSA (R-CNN, AlexNet)15.250.5-Deep Visual-Semantic Alignments for Generating Image Descriptions
RCAR62.691.185.8Plug-and-Play Regulators for Image-Text Matching
SCAN i-t44.082.674.2Stacked Cross Attention for Image-Text Matching
HGLMM FV24.766.853.4Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
SCO41.180.170.5Learning Semantic Concepts and Order for Image and Sentence Matching-
CAMP51.585.377.1CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
mCNN26.269.656.3Multimodal Convolutional Neural Networks for Matching Image and Sentence
LGSGM57.490.284.1A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
SPE29.772.160.1Learning Deep Structure-Preserving Image-Text Embeddings-
VSRN54.788.281.8Visual Semantic Reasoning for Image-Text Matching
SM-LSTM (VGG)30.272.3-Instance-aware Image and Sentence Matching with Selective Multimodal LSTM-
X-VLM (base)86.998.797.3Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
SGRAF58.588.883.0Similarity Reasoning and Filtration for Image-Text Matching
TERAN Symm.55.789.383.1Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders
0 of 18 row(s) selected.