HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Visual Question Answering (VQA)
Visual Question Answering On Vcr Q A Test
Visual Question Answering On Vcr Q A Test
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
GPT4RoI
89.4
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
ERNIE-ViL-large(ensemble of 15 models)
81.6
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
UNITER-large (10 ensemble)
79.8
UNITER: UNiversal Image-TExt Representation Learning
MAD (Single Model, Formerly CLIP-TD)
79.6
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
UNITER (Large)
77.3
UNITER: UNiversal Image-TExt Representation Learning
KVL-BERTLARGE
76.4
KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning
VL-BERTLARGE
75.8
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-T5
75.3
Unifying Vision-and-Language Tasks via Text Generation
VisualBERT
71.6
VisualBERT: A Simple and Performant Baseline for Vision and Language
OFA-X
71.2
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
OFA-X-MT
62
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
0 of 11 row(s) selected.
Previous
Next
Visual Question Answering On Vcr Q A Test | SOTA | HyperAI