Visual Question Answering On Docvqa Test
Métriques
ANLS
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | ANLS |
---|---|
matcha-enhancing-visual-language-pretraining | 0.742 |
layout-and-task-aware-instruction-prompt-for | 0.884 |
pali-3-vision-language-models-smaller-faster | 0.876 |
qwen-vl-a-frontier-large-vision-language | 0.651 |
ernie-layout-layout-knowledge-enhanced-pre | 0.8486 |
dublin-document-understanding-by-language | 0.782 |
pix2struct-screenshot-parsing-as-pretraining | 0.721 |
dublin-document-understanding-by-language | 0.803 |
pali-3-vision-language-models-smaller-faster | 0.886 |
qwen-vl-a-frontier-large-vision-language | 0.9024 |
pali-x-on-scaling-up-a-multilingual-vision | 0.868 |
pali-x-on-scaling-up-a-multilingual-vision | 0.80 |
layout-and-task-aware-instruction-prompt-for | 0.8336 |
going-full-tilt-boogie-on-document | 0.8705 |
docvqa-a-dataset-for-vqa-on-document-images | 0.665 |
docformerv2-local-features-for-document | 0.8784 |
multi-label-cluster-discrimination-for-visual | 0.916 |
unifying-vision-text-and-layout-for-universal | 0.878 |
unifying-vision-text-and-layout-for-universal | 0.847 |
omni-smola-boosting-generalist-multimodal | 0.906 |
ernie-layout-layout-knowledge-enhanced-pre | 0.8841 |
going-full-tilt-boogie-on-document | 0.8392 |
layoutlmv2-multi-modal-pre-training-for | 0.8672 |
omni-smola-boosting-generalist-multimodal | 0.908 |
donut-document-understanding-transformer | 0.675 |
end-to-end-document-recognition-and | 0.632 |
docvqa-a-dataset-for-vqa-on-document-images | 0.9436 |
qwen-vl-a-frontier-large-vision-language | 0.626 |
layoutlmv2-multi-modal-pre-training-for | 0.7808 |
pali-x-on-scaling-up-a-multilingual-vision | 0.809 |
layout-and-task-aware-instruction-prompt-for | 0.8255 |
screenai-a-vision-language-model-for-ui-and | 0.8988 |
pix2struct-screenshot-parsing-as-pretraining | 0.766 |