HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
서비스 약관
개인정보 처리방침
한국어
HyperAI
HyperAI초신경
Toggle Sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
플랫폼
홈
SOTA
시각적 질문 응답 (VQA)
Visual Question Answering On Vcr Q A Test
Visual Question Answering On Vcr Q A Test
평가 지표
Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Accuracy
Paper Title
GPT4RoI
89.4
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
ERNIE-ViL-large(ensemble of 15 models)
81.6
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
UNITER-large (10 ensemble)
79.8
UNITER: UNiversal Image-TExt Representation Learning
MAD (Single Model, Formerly CLIP-TD)
79.6
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
UNITER (Large)
77.3
UNITER: UNiversal Image-TExt Representation Learning
KVL-BERTLARGE
76.4
KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning
VL-BERTLARGE
75.8
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-T5
75.3
Unifying Vision-and-Language Tasks via Text Generation
VisualBERT
71.6
VisualBERT: A Simple and Performant Baseline for Vision and Language
OFA-X
71.2
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
OFA-X-MT
62
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
0 of 11 row(s) selected.
Previous
Next