HyperAI
HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
시각적 질문 응답 (VQA)
Visual Question Answering Vqa On Core Mm
Visual Question Answering Vqa On Core Mm
평가 지표
Abductive
Analogical
Deductive
Overall score
Params
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Abductive
Analogical
Deductive
Overall score
Params
Paper Title
Repository
MiniGPT-v2
13.28
5.69
11.02
10.43
8B
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
BLIP-2-OPT2.7B
18.96
7.5
2.76
19.31
3B
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
GPT-4V
77.88
69.86
74.86
74.44
-
GPT-4 Technical Report
SPHINX v2
49.85
20.69
42.17
39.48
16B
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
InstructBLIP
37.76
20.56
27.56
28.02
8B
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Emu
36.57
18.19
28.9
28.24
14B
Emu: Generative Pretraining in Multimodality
Otter
33.64
13.33
22.49
22.69
7B
Otter: A Multi-Modal Model with In-Context Instruction Tuning
CogVLM-Chat
47.88
28.75
36.75
37.16
17B
CogVLM: Visual Expert for Pretrained Language Models
mPLUG-Owl2
20.6
7.64
23.43
20.05
7B
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
OpenFlamingo-v2
5.3
1.11
8.88
6.82
9B
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
LLaVA-1.5
47.91
24.31
30.94
32.62
13B
Improved Baselines with Visual Instruction Tuning
Qwen-VL-Chat
44.39
30.42
37.55
37.39
16B
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
LLaMA-Adapter V2
46.12
22.08
28.7
30.46
7B
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
InternLM-XComposer-VL
35.97
18.61
26.77
26.84
9B
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
0 of 14 row(s) selected.
Previous
Next
Visual Question Answering Vqa On Core Mm | SOTA | HyperAI초신경