HyperAI초신경

Visual Question Answering On Vqa V2 Test Std

평가 지표

overall

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름overall
lxmert-learning-cross-modality-encoder72.5
sparse-and-continuous-attention-mechanisms66.27
visualbert-a-simple-and-performant-baseline71
x-2-vlm-all-in-one-pre-trained-model-for81.8
tips-and-tricks-for-visual-question-answering70.3
making-the-v-in-vqa-matter-elevating-the-role62.27
bottom-up-and-top-down-attention-for-image70.34
prompt-tuning-for-generative-multimodal78.53
deep-modular-co-attention-networks-for-visual-170.9
image-as-a-foreign-language-beit-pretraining84.03
vlmo-unified-vision-language-pre-training81.30
valor-vision-audio-language-omni-perception78.62
block-bilinear-superdiagonal-fusion-for67.9
mplug-effective-and-efficient-vision-language83.62
learning-to-count-objects-in-natural-images68.4
graph-reasoning-networks-for-visual-question75.92
simvlm-simple-visual-language-model80.34
vl-bert-pre-training-of-generic-visual72.2
in-defense-of-grid-features-for-visual74.16
deep-multimodal-neural-architecture-search73.86
prismer-a-vision-language-model-with-an78.49
mutan-multimodal-tucker-fusion-for-visual67.4
vinvl-making-visual-representations-matter-in76.63
x-2-vlm-all-in-one-pre-trained-model-for80.2
vinvl-making-visual-representations-matter-in77.45
in-defense-of-grid-features-for-visual72.71
uniter-learning-universal-image-text-173.4
모델 2880.19
murel-multimodal-relational-reasoning-for68.4
bilinear-attention-networks70.4
ernie-vil-knowledge-enhanced-vision-language74.93
one-peace-exploring-one-general82.52
making-the-v-in-vqa-matter-elevating-the-role25.98
making-the-v-in-vqa-matter-elevating-the-role44.26
unified-vision-language-pre-training-for70.7
visual-commonsense-r-cnn71.49
align-before-fuse-vision-and-language76.04
19060051369.7