HyperAI초신경

Visual Question Answering On Vcr Q A Test

평가 지표

Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

		Paper Title
GPT4RoI	89.4	GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
ERNIE-ViL-large(ensemble of 15 models)	81.6	ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
UNITER-large (10 ensemble)	79.8	UNITER: UNiversal Image-TExt Representation Learning
MAD (Single Model, Formerly CLIP-TD)	79.6	Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
UNITER (Large)	77.3	UNITER: UNiversal Image-TExt Representation Learning
KVL-BERTLARGE	76.4	KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning
VL-BERTLARGE	75.8	VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-T5	75.3	Unifying Vision-and-Language Tasks via Text Generation
VisualBERT	71.6	VisualBERT: A Simple and Performant Baseline for Vision and Language
OFA-X	71.2	Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
OFA-X-MT	62	Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations

0 of 11 row(s) selected.