HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
서비스 약관
개인정보 처리방침
한국어
HyperAI
HyperAI초신경
Toggle Sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
플랫폼
홈
SOTA
시각적 질문 응답 (VQA)
Visual Question Answering On Vqa V2 Test Std
Visual Question Answering On Vqa V2 Test Std
평가 지표
overall
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
overall
Paper Title
BEiT-3
84.03
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
mPLUG-Huge
83.62
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
ONE-PEACE
82.52
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
X2-VLM (large)
81.8
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
VLMo
81.30
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
SimVLM
80.34
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
X2-VLM (base)
80.2
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
VAST
80.19
-
VALOR
78.62
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Prompt Tuning
78.53
Prompt Tuning for Generative Multimodal Pretrained Models
Prismer
78.49
Prismer: A Vision-Language Model with Multi-Task Experts
MSR + MS Cog. Svcs., X10 models
77.45
VinVL: Revisiting Visual Representations in Vision-Language Models
MSR + MS Cog. Svcs.
76.63
VinVL: Revisiting Visual Representations in Vision-Language Models
ALBEF (14M)
76.04
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
BGN, ensemble
75.92
Bilinear Graph Networks for Visual Question Answering
ERNIE-ViL-single model
74.93
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Single, w/o VLP
74.16
In Defense of Grid Features for Visual Question Answering
Single, w/o VLP
73.86
Deep Multimodal Neural Architecture Search
UNITER (Large)
73.4
UNITER: UNiversal Image-TExt Representation Learning
X-101 grid features + MCAN
72.71
In Defense of Grid Features for Visual Question Answering
0 of 38 row(s) selected.
Previous
Next