HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ホーム
SOTA
Image Retrieval
Image Retrieval On Flickr30K 1K Test
Image Retrieval On Flickr30K 1K Test
評価指標
R@1
R@10
R@5
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
R@1
R@10
R@5
Paper Title
Repository
TERAN MrSw
56.5
88.2
81.2
Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders
DAN
39.4
79.1
69.2
Dual Attention Networks for Multimodal Reasoning and Matching
VisualSparta
57.4
88.1
82.0
VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words
2WayNet (VGG)
36.0
-
-
Linking Image and Text with 2-Way Nets
DVSA (R-CNN, AlexNet)
15.2
50.5
-
Deep Visual-Semantic Alignments for Generating Image Descriptions
RCAR
62.6
91.1
85.8
Plug-and-Play Regulators for Image-Text Matching
SCAN i-t
44.0
82.6
74.2
Stacked Cross Attention for Image-Text Matching
HGLMM FV
24.7
66.8
53.4
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
SCO
41.1
80.1
70.5
Learning Semantic Concepts and Order for Image and Sentence Matching
-
CAMP
51.5
85.3
77.1
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
mCNN
26.2
69.6
56.3
Multimodal Convolutional Neural Networks for Matching Image and Sentence
LGSGM
57.4
90.2
84.1
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
SPE
29.7
72.1
60.1
Learning Deep Structure-Preserving Image-Text Embeddings
-
VSRN
54.7
88.2
81.8
Visual Semantic Reasoning for Image-Text Matching
SM-LSTM (VGG)
30.2
72.3
-
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
-
X-VLM (base)
86.9
98.7
97.3
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
SGRAF
58.5
88.8
83.0
Similarity Reasoning and Filtration for Image-Text Matching
TERAN Symm.
55.7
89.3
83.1
Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders
0 of 18 row(s) selected.
Previous
Next