Phrase Grounding On Flickr30K Entities Test
المقاييس
R@1
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
جدول المقارنة
اسم النموذج | R@1 |
---|---|
multimodal-compact-bilinear-pooling-for | 48.69 |
disentangled-motif-aware-graph-learning-for | 78.73 |
visualbert-a-simple-and-performant-baseline | 71.33 |
learning-cross-modal-context-graph-for-visual | 76.74 |
phrase-grounding-by-soft-label-chain | 74.69 |
natural-language-object-retrieval | 27.8 |
rethinking-diversified-and-discriminative | 73.3 |
learning-deep-structure-preserving-image-text | 43.89 |
glipv2-unifying-localization-and-vision | 87.7 |
flickr30k-entities-collecting-region-to | 25.30 |
flickr30k-entities-collecting-region-to | 41.77 |
mdetr-modulated-detection-for-end-to-end | 84.3 |
grounded-language-image-pre-training | 87.1 |
pevl-position-enhanced-pre-training-and | 84.4 |
flickr30k-entities-collecting-region-to | 30.83 |
bilinear-attention-networks | 69.69 |
coarse-to-fine-vision-language-pre-training | 87.4 |
grounding-of-textual-phrases-in-images-by | 48.38 |