HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
서비스 약관
개인정보 처리방침
한국어
HyperAI
HyperAI초신경
Toggle Sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
플랫폼
홈
SOTA
시각적 질문 응답 (VQA)
Visual Question Answering On Ok Vqa
Visual Question Answering On Ok Vqa
평가 지표
Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Accuracy
Paper Title
PaLI-X-VPD
66.8
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
PaLM-E-562B
66.1
PaLM-E: An Embodied Multimodal Language Model
PaLI-X (Single-task FT)
66.1
PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI 17B
64.5
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Prophet
62.5
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
RA-VQA-v2 (BLIP 2)
62.08
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering
A Simple Baseline for KB-VQA
61.2
A Simple Baseline for Knowledge-Based Visual Question Answering
PromptCap
60.4
PromptCap: Prompt-Guided Task-Aware Image Captioning
ReVeaL WIT + CC12M + Wikidata + VQA-2
59.1
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Lyrics
58.2
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
REVIVE (Ensemble)
58.0
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
REVIVE (Single)
56.6
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
RA-VQA-v2 (T5-large)
54.85
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering
RA-VQA (T5-large)
54.48
Retrieval Augmented Visual Question Answering with Outside Knowledge
VK-OOD
52.4
Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
VK-OOD
52.4
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
RA-VQA-FrDPR (T5-large)
51.22
Retrieval Augmented Visual Question Answering with Outside Knowledge
Flamingo80B
50.6
Flamingo: a Visual Language Model for Few-Shot Learning
TRiG (T5-Large)
50.50
Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering
HYDRA
48.6
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
0 of 37 row(s) selected.
Previous
Next