HyperAI超神経

Visual Question Answering On Ok Vqa

評価指標

Accuracy

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

比較表
モデル名Accuracy
palm-e-an-embodied-multimodal-language-model66.1
an-empirical-study-of-gpt-3-for-few-shot48.0
language-models-are-general-purpose11.4
revive-regional-visual-representation-matters58.0
a-simple-baseline-for-knowledge-based-visual61.2
prompting-large-language-models-with-answer62.5
plug-and-play-vqa-zero-shot-vqa-by-conjoining35.9
retrieval-augmented-visual-question-answering51.22
vlc-bert-visual-question-answering-with43.1
blip-2-bootstrapping-language-image-pre39.4
multimodal-few-shot-learning-with-frozen 5.9
lako-knowledge-driven-visual-question42.03
implicit-differentiable-outlier-detection52.4
a-good-prompt-is-worth-millions-of-parameters16.5
blip-2-bootstrapping-language-image-pre45.9
lako-knowledge-driven-visual-question47.01
enabling-multimodal-generation-on-clip-via10.5
fine-grained-late-interaction-multi-modal-162.08
blip-2-bootstrapping-language-image-pre31.7
flamingo-a-visual-language-model-for-few-shot-141.2
pali-a-jointly-scaled-multilingual-language64.5
lyrics-boosting-fine-grained-language-vision58.2
revive-regional-visual-representation-matters56.6
flamingo-a-visual-language-model-for-few-shot-144.7
blip-2-bootstrapping-language-image-pre40.7
visual-program-distillation-distilling-tools66.8
promptcap-prompt-guided-task-aware-image60.4
pali-x-on-scaling-up-a-multilingual-vision66.1
differentiable-outlier-detection-enable52.4
fine-grained-late-interaction-multi-modal-154.85
retrieval-augmented-visual-question-answering54.48
reveal-retrieval-augmented-visual-language59.1
blip-2-bootstrapping-language-image-pre30.2
transform-retrieve-generate-natural-language50.50
blip-2-bootstrapping-language-image-pre36.4
hydra-a-hyper-agent-for-dynamic-compositional48.6
flamingo-a-visual-language-model-for-few-shot-150.6