HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Visual Question Answering (VQA)
Visual Question Answering On A Okvqa
Visual Question Answering On A Okvqa
Metrics
DA VQA Score
MC Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
DA VQA Score
MC Accuracy
Paper Title
SMoLA-PaLI-X Specialist Model
70.55
83.75
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
PaLI-X-VPD
68.2
80.4
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
PromptCap
59.6
73.2
PromptCap: Prompt-Guided Task-Aware Image Captioning
Prophet
58.5
75.1
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
A Simple Baseline for KB-VQA
57.5
-
A Simple Baseline for Knowledge-Based Visual Question Answering
KRISP
42.2
42.2
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
GPV-2
40.7
53.7
Webly Supervised Concept Expansion for General Purpose Vision Models
VLC-BERT
38.05
-
VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge
LXMERT
25.9
41.6
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
ViLBERT
25.9
41.5
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Pythia
21.9
40.1
Pythia v0.1: the Winning Entry to the VQA Challenge 2018
ViLBERT - VQA
12.0
42.1
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
ViLBERT - OK-VQA
9.2
34.1
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
MC-CoT
-
71
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
HYDRA
-
56.35
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
0 of 15 row(s) selected.
Previous
Next
Visual Question Answering On A Okvqa | SOTA | HyperAI