HyperAI

Visual Question Answering Vqa On

Metriken

ANLS

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Vergleichstabelle
ModellnameANLS
layout-and-task-aware-instruction-prompt-for48.98
gemini-a-family-of-highly-capable-multimodal-180.3
dublin-document-understanding-by-language36.82
pali-x-on-scaling-up-a-multilingual-vision49.2
pix2struct-screenshot-parsing-as-pretraining38.2
pali-3-vision-language-models-smaller-faster57.8
dublin-document-understanding-by-language42.6
pali-x-on-scaling-up-a-multilingual-vision50.7
pali-x-on-scaling-up-a-multilingual-vision54.8
omni-smola-boosting-generalist-multimodal66.2
pix2struct-screenshot-parsing-as-pretraining40
docformerv2-local-features-for-document48.8
matcha-enhancing-visual-language-pretraining37.2
lapdoc-layout-aware-prompting-for-documents54.9
layout-and-task-aware-instruction-prompt-for54.51
going-full-tilt-boogie-on-document61.20
unifying-vision-text-and-layout-for-universal47.4
screenai-a-vision-language-model-for-ui-and65.90
omni-smola-boosting-generalist-multimodal65.6
pali-3-vision-language-models-smaller-faster62.4
unifying-vision-text-and-layout-for-universal63.0