HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
Chart Question Answering
Chart Question Answering On Chartqa
Chart Question Answering On Chartqa
평가 지표
1:1 Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
1:1 Accuracy
Paper Title
Repository
PaLI-3 (w/ OCR)
69.5
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
DePlot+GPT3 (Self-Consistency)
42.3
DePlot: One-shot visual language reasoning by plot-to-table translation
MatCha
64.2
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
UniChart
66.24
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
PaLI-X (Single-task FT)
70.9
PaLI-X: On Scaling up a Multilingual Vision and Language Model
DePlot+GPT3 (CoT)
36.9
DePlot: One-shot visual language reasoning by plot-to-table translation
VisionTapas-OCR
45.5
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
Pix2Struct-large
58.6
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Gemini Ultra
80.8
Gemini: A Family of Highly Capable Multimodal Models
StructChart+GPT3.5 (STR)
60.7
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding
-
SMoLA-PaLI-X Generalist Model
73.8
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
-
Qwen-VL
65.7
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
SMoLA-PaLI-X Specialist Model
74.6
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
-
ScreenAI 5B (4.62 B params, w/ OCR)
76.7
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
ChartPaLI-5B + PaLM 2-S
81.3
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
-
DePlot+Codex (PoT Self-Consistency)
76.7
DePlot: One-shot visual language reasoning by plot-to-table translation
PaLI-X (Multi-task FT)
70.6
PaLI-X: On Scaling up a Multilingual Vision and Language Model
DePlot+FlanPaLM (CoT)
67.3
DePlot: One-shot visual language reasoning by plot-to-table translation
PaLI-3
70
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Pix2Struct-base
56.0
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
0 of 27 row(s) selected.
Previous
Next