HyperAI

Phrase Grounding On Flickr30K Entities Test

المقاييس

R@1

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

جدول المقارنة
اسم النموذجR@1
multimodal-compact-bilinear-pooling-for48.69
disentangled-motif-aware-graph-learning-for78.73
visualbert-a-simple-and-performant-baseline71.33
learning-cross-modal-context-graph-for-visual76.74
phrase-grounding-by-soft-label-chain74.69
natural-language-object-retrieval27.8
rethinking-diversified-and-discriminative73.3
learning-deep-structure-preserving-image-text43.89
glipv2-unifying-localization-and-vision87.7
flickr30k-entities-collecting-region-to25.30
flickr30k-entities-collecting-region-to41.77
mdetr-modulated-detection-for-end-to-end84.3
grounded-language-image-pre-training87.1
pevl-position-enhanced-pre-training-and84.4
flickr30k-entities-collecting-region-to30.83
bilinear-attention-networks69.69
coarse-to-fine-vision-language-pre-training87.4
grounding-of-textual-phrases-in-images-by48.38