HyperAI
HyperAI
Startseite
Plattform
Dokumentation
Neuigkeiten
Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Nutzungsbedingungen
Datenschutzrichtlinie
Deutsch
HyperAI
HyperAI
Toggle Sidebar
Seite durchsuchen…
⌘
K
Command Palette
Search for a command to run...
Plattform
Startseite
SOTA
Bildsuche
Image Retrieval On Flickr30K 1K Test
Image Retrieval On Flickr30K 1K Test
Metriken
R@1
R@10
R@5
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
R@1
R@10
R@5
Paper Title
X-VLM (base)
86.9
98.7
97.3
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
RCAR
62.6
91.1
85.8
Plug-and-Play Regulators for Image-Text Matching
SGRAF
58.5
88.8
83.0
Similarity Reasoning and Filtration for Image-Text Matching
VisualSparta
57.4
88.1
82.0
VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words
LGSGM
57.4
90.2
84.1
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
TERAN MrSw
56.5
88.2
81.2
Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders
TERAN Symm.
55.7
89.3
83.1
Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders
VSRN
54.7
88.2
81.8
Visual Semantic Reasoning for Image-Text Matching
CAMP
51.5
85.3
77.1
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
SCAN i-t
44.0
82.6
74.2
Stacked Cross Attention for Image-Text Matching
SCO
41.1
80.1
70.5
Learning Semantic Concepts and Order for Image and Sentence Matching
DAN
39.4
79.1
69.2
Dual Attention Networks for Multimodal Reasoning and Matching
2WayNet (VGG)
36.0
-
-
Linking Image and Text with 2-Way Nets
SM-LSTM (VGG)
30.2
72.3
-
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
SPE
29.7
72.1
60.1
Learning Deep Structure-Preserving Image-Text Embeddings
mCNN
26.2
69.6
56.3
Multimodal Convolutional Neural Networks for Matching Image and Sentence
HGLMM FV
24.7
66.8
53.4
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
DVSA (R-CNN, AlexNet)
15.2
50.5
-
Deep Visual-Semantic Alignments for Generating Image Descriptions
0 of 18 row(s) selected.
Previous
Next
Image Retrieval On Flickr30K 1K Test | SOTA | HyperAI