HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Visual Question Answering (VQA)
Visual Question Answering On Ok Vqa
Visual Question Answering On Ok Vqa
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
PaLI-X-VPD
66.8
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
PaLM-E-562B
66.1
PaLM-E: An Embodied Multimodal Language Model
PaLI-X (Single-task FT)
66.1
PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI 17B
64.5
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Prophet
62.5
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
RA-VQA-v2 (BLIP 2)
62.08
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering
A Simple Baseline for KB-VQA
61.2
A Simple Baseline for Knowledge-Based Visual Question Answering
PromptCap
60.4
PromptCap: Prompt-Guided Task-Aware Image Captioning
ReVeaL WIT + CC12M + Wikidata + VQA-2
59.1
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Lyrics
58.2
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
REVIVE (Ensemble)
58.0
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
REVIVE (Single)
56.6
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
RA-VQA-v2 (T5-large)
54.85
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering
RA-VQA (T5-large)
54.48
Retrieval Augmented Visual Question Answering with Outside Knowledge
VK-OOD
52.4
Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
VK-OOD
52.4
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
RA-VQA-FrDPR (T5-large)
51.22
Retrieval Augmented Visual Question Answering with Outside Knowledge
Flamingo80B
50.6
Flamingo: a Visual Language Model for Few-Shot Learning
TRiG (T5-Large)
50.50
Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering
HYDRA
48.6
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
0 of 37 row(s) selected.
Previous
Next
Visual Question Answering On Ok Vqa | SOTA | HyperAI