HyperAI
HyperAI
Startseite
Plattform
Dokumentation
Neuigkeiten
Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Nutzungsbedingungen
Datenschutzrichtlinie
Deutsch
HyperAI
HyperAI
Toggle Sidebar
Seite durchsuchen…
⌘
K
Command Palette
Search for a command to run...
Plattform
Startseite
SOTA
Visuelles Fragebeantworten (VQA)
Visual Question Answering On Docvqa Test
Visual Question Answering On Docvqa Test
Metriken
ANLS
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
ANLS
Paper Title
Human
0.9436
DocVQA: A Dataset for VQA on Document Images
MLCD-Embodied-7B
0.916
Multi-label Cluster Discrimination for Visual Representation Learning
SMoLA-PaLI-X Specialist
0.908
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
SMoLA-PaLI-X Generalist
0.906
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Qwen-VL-Plus
0.9024
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
ScreenAI 5B (4.62 B params, w/OCR)
0.8988
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
PaLI-3 (w/ OCR)
0.886
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
ERNIE-Layout large (ensemble)
0.8841
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
GPT-4
0.884
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
DocFormerv2-large
0.8784
DocFormerv2: Local Features for Document Understanding
UDOP (aux)
0.878
Unifying Vision, Text, and Layout for Universal Document Processing
PaLI-3
0.876
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
TILT-Large
0.8705
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
PaLI-X (Single-task FT w/ OCR)
0.868
PaLI-X: On Scaling up a Multilingual Vision and Language Model
LayoutLMv2LARGE
0.8672
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
ERNIE-Layout large
0.8486
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
UDOP
0.847
Unifying Vision, Text, and Layout for Universal Document Processing
TILT-Base
0.8392
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Claude + LATIN-Prompt
0.8336
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
GPT-3.5 + LATIN-Prompt
0.8255
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
0 of 33 row(s) selected.
Previous
Next
Visual Question Answering On Docvqa Test | SOTA | HyperAI