Visual Question Answering On A Okvqa

Metriken

DA VQA Score

MC Accuracy

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

			Paper Title
SMoLA-PaLI-X Specialist Model	70.55	83.75	Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
PaLI-X-VPD	68.2	80.4	Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
PromptCap	59.6	73.2	PromptCap: Prompt-Guided Task-Aware Image Captioning
Prophet	58.5	75.1	Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
A Simple Baseline for KB-VQA	57.5	-	A Simple Baseline for Knowledge-Based Visual Question Answering
KRISP	42.2	42.2	KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
GPV-2	40.7	53.7	Webly Supervised Concept Expansion for General Purpose Vision Models
VLC-BERT	38.05	-	VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge
LXMERT	25.9	41.6	LXMERT: Learning Cross-Modality Encoder Representations from Transformers
ViLBERT	25.9	41.5	ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Pythia	21.9	40.1	Pythia v0.1: the Winning Entry to the VQA Challenge 2018
ViLBERT - VQA	12.0	42.1	ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
ViLBERT - OK-VQA	9.2	34.1	ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
MC-CoT	-	71	Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
HYDRA	-	56.35	HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

0 of 15 row(s) selected.

Visual Question Answering On A Okvqa | SOTA | HyperAI