HyperAI
HyperAI
الرئيسية
المنصة
الوثائق
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
شروط الخدمة
سياسة الخصوصية
العربية
HyperAI
HyperAI
Toggle Sidebar
البحث في الموقع...
⌘
K
Command Palette
Search for a command to run...
المنصة
الرئيسية
SOTA
الأسئلة المرئية والإجابة عليها (VQA)
Visual Question Answering On A Okvqa
Visual Question Answering On A Okvqa
المقاييس
DA VQA Score
MC Accuracy
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
DA VQA Score
MC Accuracy
Paper Title
SMoLA-PaLI-X Specialist Model
70.55
83.75
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
PaLI-X-VPD
68.2
80.4
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
PromptCap
59.6
73.2
PromptCap: Prompt-Guided Task-Aware Image Captioning
Prophet
58.5
75.1
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
A Simple Baseline for KB-VQA
57.5
-
A Simple Baseline for Knowledge-Based Visual Question Answering
KRISP
42.2
42.2
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
GPV-2
40.7
53.7
Webly Supervised Concept Expansion for General Purpose Vision Models
VLC-BERT
38.05
-
VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge
LXMERT
25.9
41.6
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
ViLBERT
25.9
41.5
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Pythia
21.9
40.1
Pythia v0.1: the Winning Entry to the VQA Challenge 2018
ViLBERT - VQA
12.0
42.1
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
ViLBERT - OK-VQA
9.2
34.1
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
MC-CoT
-
71
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
HYDRA
-
56.35
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
0 of 15 row(s) selected.
Previous
Next
Visual Question Answering On A Okvqa | SOTA | HyperAI