DUBLIN (variable resolution) | 0.803 | DUBLIN -- Document Understanding By Language-Image Network | - |
PaLI-X (Single-task FT w/ OCR) | 0.868 | PaLI-X: On Scaling up a Multilingual Vision and Language Model | |
BERT_LARGE_SQUAD_DOCVQA_FINETUNED_Baseline | 0.665 | DocVQA: A Dataset for VQA on Document Images | |