HyperAI
HyperAI
الرئيسية
الأخبار
أحدث الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
العربية
HyperAI
HyperAI
Toggle sidebar
البحث في الموقع...
⌘
K
الرئيسية
SOTA
الأسئلة المرئية والإجابة عليها (VQA)
Visual Question Answering On Docvqa Test
Visual Question Answering On Docvqa Test
المقاييس
ANLS
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
ANLS
Paper Title
Repository
MatCha
0.742
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
GPT-4
0.884
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
PaLI-3
0.876
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Qwen-VL
0.651
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
ERNIE-Layout large
0.8486
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
DUBLIN
0.782
DUBLIN -- Document Understanding By Language-Image Network
-
Pix2Struct-base
0.721
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
DUBLIN (variable resolution)
0.803
DUBLIN -- Document Understanding By Language-Image Network
-
PaLI-3 (w/ OCR)
0.886
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Qwen-VL-Plus
0.9024
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
PaLI-X (Single-task FT w/ OCR)
0.868
PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI-X (Single-task FT)
0.80
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Claude + LATIN-Prompt
0.8336
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
TILT-Large
0.8705
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
BERT_LARGE_SQUAD_DOCVQA_FINETUNED_Baseline
0.665
DocVQA: A Dataset for VQA on Document Images
DocFormerv2-large
0.8784
DocFormerv2: Local Features for Document Understanding
MLCD-Embodied-7B
0.916
Multi-label Cluster Discrimination for Visual Representation Learning
UDOP (aux)
0.878
Unifying Vision, Text, and Layout for Universal Document Processing
UDOP
0.847
Unifying Vision, Text, and Layout for Universal Document Processing
SMoLA-PaLI-X Generalist
0.906
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
-
0 of 33 row(s) selected.
Previous
Next
Visual Question Answering On Docvqa Test | SOTA | HyperAI