HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
서비스 약관
개인정보 처리방침
한국어
HyperAI
HyperAI초신경
Toggle Sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
플랫폼
홈
SOTA
시각적 질문 응답 (VQA)
Visual Question Answering On Vqa V2 Test Dev
Visual Question Answering On Vqa V2 Test Dev
평가 지표
Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Accuracy
Paper Title
PaLI
84.3
PaLI: A Jointly-Scaled Multilingual Language-Image Model
BEiT-3
84.19
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
VLMo
82.78
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
ONE-PEACE
82.6
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
mPLUG (Huge)
82.43
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
CuMo-7B
82.2
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
X2-VLM (large)
81.9
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
MMU
81.26
Achieving Human Parity on Visual Question Answering
Lyrics
81.2
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
InternVL-C
81.2
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
X2-VLM (base)
80.4
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
XFM (base)
80.4
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
VAST
80.23
-
SimVLM
80.03
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
VALOR
78.46
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Prismer
78.43
Prismer: A Vision-Language Model with Multi-Task Experts
X-VLM (base)
78.22
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
VK-OOD
77.9
Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
ALBEF (14M)
75.84
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Oscar
73.82
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
0 of 56 row(s) selected.
Previous
Next
Visual Question Answering On Vqa V2 Test Dev | SOTA | HyperAI초신경