Visual Question Answering On Vqa V2 Test Std
评估指标
overall
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | overall |
---|---|
lxmert-learning-cross-modality-encoder | 72.5 |
sparse-and-continuous-attention-mechanisms | 66.27 |
visualbert-a-simple-and-performant-baseline | 71 |
x-2-vlm-all-in-one-pre-trained-model-for | 81.8 |
tips-and-tricks-for-visual-question-answering | 70.3 |
making-the-v-in-vqa-matter-elevating-the-role | 62.27 |
bottom-up-and-top-down-attention-for-image | 70.34 |
prompt-tuning-for-generative-multimodal | 78.53 |
deep-modular-co-attention-networks-for-visual-1 | 70.9 |
image-as-a-foreign-language-beit-pretraining | 84.03 |
vlmo-unified-vision-language-pre-training | 81.30 |
valor-vision-audio-language-omni-perception | 78.62 |
block-bilinear-superdiagonal-fusion-for | 67.9 |
mplug-effective-and-efficient-vision-language | 83.62 |
learning-to-count-objects-in-natural-images | 68.4 |
graph-reasoning-networks-for-visual-question | 75.92 |
simvlm-simple-visual-language-model | 80.34 |
vl-bert-pre-training-of-generic-visual | 72.2 |
in-defense-of-grid-features-for-visual | 74.16 |
deep-multimodal-neural-architecture-search | 73.86 |
prismer-a-vision-language-model-with-an | 78.49 |
mutan-multimodal-tucker-fusion-for-visual | 67.4 |
vinvl-making-visual-representations-matter-in | 76.63 |
x-2-vlm-all-in-one-pre-trained-model-for | 80.2 |
vinvl-making-visual-representations-matter-in | 77.45 |
in-defense-of-grid-features-for-visual | 72.71 |
uniter-learning-universal-image-text-1 | 73.4 |
模型 28 | 80.19 |
murel-multimodal-relational-reasoning-for | 68.4 |
bilinear-attention-networks | 70.4 |
ernie-vil-knowledge-enhanced-vision-language | 74.93 |
one-peace-exploring-one-general | 82.52 |
making-the-v-in-vqa-matter-elevating-the-role | 25.98 |
making-the-v-in-vqa-matter-elevating-the-role | 44.26 |
unified-vision-language-pre-training-for | 70.7 |
visual-commonsense-r-cnn | 71.49 |
align-before-fuse-vision-and-language | 76.04 |
190600513 | 69.7 |