HyperAI
HyperAI超神経
ホーム
プラットフォーム
ドキュメント
ニュース
論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
利用規約
プライバシーポリシー
日本語
HyperAI
HyperAI超神経
Toggle Sidebar
サイトを検索…
⌘
K
Command Palette
Search for a command to run...
プラットフォーム
ホーム
SOTA
ビジュアルクエスチョンアンサリング
Visual Question Answering On Ok Vqa
Visual Question Answering On Ok Vqa
評価指標
Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Accuracy
Paper Title
PaLI-X-VPD
66.8
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
PaLM-E-562B
66.1
PaLM-E: An Embodied Multimodal Language Model
PaLI-X (Single-task FT)
66.1
PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI 17B
64.5
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Prophet
62.5
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
RA-VQA-v2 (BLIP 2)
62.08
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering
A Simple Baseline for KB-VQA
61.2
A Simple Baseline for Knowledge-Based Visual Question Answering
PromptCap
60.4
PromptCap: Prompt-Guided Task-Aware Image Captioning
ReVeaL WIT + CC12M + Wikidata + VQA-2
59.1
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Lyrics
58.2
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
REVIVE (Ensemble)
58.0
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
REVIVE (Single)
56.6
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
RA-VQA-v2 (T5-large)
54.85
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering
RA-VQA (T5-large)
54.48
Retrieval Augmented Visual Question Answering with Outside Knowledge
VK-OOD
52.4
Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
VK-OOD
52.4
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
RA-VQA-FrDPR (T5-large)
51.22
Retrieval Augmented Visual Question Answering with Outside Knowledge
Flamingo80B
50.6
Flamingo: a Visual Language Model for Few-Shot Learning
TRiG (T5-Large)
50.50
Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering
HYDRA
48.6
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
0 of 37 row(s) selected.
Previous
Next
Visual Question Answering On Ok Vqa | SOTA | HyperAI超神経