HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ホーム
SOTA
Phrase Grounding
Phrase Grounding On Flickr30K Entities Test
Phrase Grounding On Flickr30K Entities Test
評価指標
R@1
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
R@1
Paper Title
Repository
MCB
48.69
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
DIGN
78.73
Disentangled Motif-aware Graph Learning for Phrase Grounding
-
VisualBERT
71.33
VisualBERT: A Simple and Performant Baseline for Vision and Language
LCMCG
76.74
Learning Cross-modal Context Graph for Visual Grounding
Soft-Label Chain CRF (SL-CCRF)
74.69
Phrase Grounding by Soft-Label Chain Conditional Random Field
SCRC
27.8
Natural Language Object Retrieval
DDPN (ResNet-101)
73.3
Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding
DSPE
43.89
Learning Deep Structure-Preserving Image-Text Embeddings
-
GLIPv2
87.7
GLIPv2: Unifying Localization and Vision-Language Understanding
CCA
25.30
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
CCA - Fast RCNN
41.77
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
MDETR-ENB5
84.3
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
GLIP
87.1
Grounded Language-Image Pre-training
PEVL
84.4
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
CCA - VGG19
30.83
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
BAN (Bottom-Up detector)
69.69
Bilinear Attention Networks
FIBER-B
87.4
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
GroundeR 100.0% annot.
48.38
Grounding of Textual Phrases in Images by Reconstruction
0 of 18 row(s) selected.
Previous
Next