Visual Question Answering Vqa On
评估指标
ANLS
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | ANLS |
---|---|
layout-and-task-aware-instruction-prompt-for | 48.98 |
gemini-a-family-of-highly-capable-multimodal-1 | 80.3 |
dublin-document-understanding-by-language | 36.82 |
pali-x-on-scaling-up-a-multilingual-vision | 49.2 |
pix2struct-screenshot-parsing-as-pretraining | 38.2 |
pali-3-vision-language-models-smaller-faster | 57.8 |
dublin-document-understanding-by-language | 42.6 |
pali-x-on-scaling-up-a-multilingual-vision | 50.7 |
pali-x-on-scaling-up-a-multilingual-vision | 54.8 |
omni-smola-boosting-generalist-multimodal | 66.2 |
pix2struct-screenshot-parsing-as-pretraining | 40 |
docformerv2-local-features-for-document | 48.8 |
matcha-enhancing-visual-language-pretraining | 37.2 |
lapdoc-layout-aware-prompting-for-documents | 54.9 |
layout-and-task-aware-instruction-prompt-for | 54.51 |
going-full-tilt-boogie-on-document | 61.20 |
unifying-vision-text-and-layout-for-universal | 47.4 |
screenai-a-vision-language-model-for-ui-and | 65.90 |
omni-smola-boosting-generalist-multimodal | 65.6 |
pali-3-vision-language-models-smaller-faster | 62.4 |
unifying-vision-text-and-layout-for-universal | 63.0 |