Command Palette
Search for a command to run...
视觉问答 (VQA)
Visual Question Answering (VQA) 是计算机视觉领域的一项任务,旨在通过自然语言回答关于图像的问题。该任务的核心目标是使机器能够理解图像内容,并以准确、连贯的语言形式提供答案。VQA 在人机交互、智能辅助和内容理解等方面具有重要应用价值,能够显著提升机器的视觉认知能力。
GQA Test2019
VQA v2 test-dev
Oscar
VQA v2 test-std
BEiT-3
OK-VQA
MetaLM
MSVD-QA
HCRN
MSRVTT-QA
HCRN
DocVQA test
Human
InfographicVQA
Gemini Ultra (pixel only)
GQA test-dev
CFR
VizWiz 2020 VQA
CLEVR
NS-VQA (1K programs)
A-OKVQA
InfiMM-Eval
GPT-4V
COCO Visual Question Answering (VQA) real images 1.0 open ended
IconQA
Patch-TRM
TextVQA test-standard
PaLI
VQA v2 val
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
VCR (Q-A) test
VizWiz 2018
LXR955, No Ensemble
VQA-CP
CSS
COCO Visual Question Answering (VQA) real images 1.0 multiple choice
MCB 7 att.
VQA-CE
RandImg
VLM2-Bench
VCR (QA-R) test
UNITER (Large)
InfoSeek
VQA v1 test-dev
SAAA (ResNet)
VCR (Q-AR) test
GPT4RoI
IllusionVQA
GQA test-std
ProTo
VQA v1 test-std
SAAA (ResNet)
WHOOPS!
VizWiz 2020 Answerability
QLEVR
MAC
AutoHallusion
GPT-4V
CLEVR-Humans
MDETR
PMC-VQA
PlotQA-D1
COCO Visual Question Answering (VQA) real images 2.0 open ended
HDU-USYD-UNCC
COCO Visual Question Answering (VQA) abstract images 1.0 open ended
COCO Visual Question Answering (VQA) abstract 1.0 multiple choice
AI2D
Visual7W
CMN
HallusionBench
GPT-4V
PlotQA-D2
VCR (QA-R) dev
VL-BERTLARGE
VCR (Q-AR) dev
VL-BERTLARGE
F-VQA
ZS-F-VQA
FigureQA - test 1
PReFIL
VCR (Q-A) dev
VL-BERTLARGE
GRIT
DocVQA val
BERT LARGE Baseline
TGIF-QA
TDIUC
Accuracy
GQA
PEVL+
VQA-X
RetVQA
MI-BART
Visual Genome (pairs)
CMN
ArtQuest
PrefixLM with CLIP and T5
OVAD benchmark
EgoSchema
Lyra-Pro
COCO
MME
ActivityNet
BLIP-2 T5
Visual Genome (subjects)
Video MME
CORE-MM
MM-Vet
DVQA test-familiar
PReFIL (Oracle OCR)
DeepForm
MVBench
WebSRC
ZS-F-VQA
SAN † - hard mask
VizWiz 2018 Answerability
DocVQA
TextVQA
ImageNet