Visual Question Answering On Ok Vqa
Métriques
Accuracy
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | Accuracy |
---|---|
palm-e-an-embodied-multimodal-language-model | 66.1 |
an-empirical-study-of-gpt-3-for-few-shot | 48.0 |
language-models-are-general-purpose | 11.4 |
revive-regional-visual-representation-matters | 58.0 |
a-simple-baseline-for-knowledge-based-visual | 61.2 |
prompting-large-language-models-with-answer | 62.5 |
plug-and-play-vqa-zero-shot-vqa-by-conjoining | 35.9 |
retrieval-augmented-visual-question-answering | 51.22 |
vlc-bert-visual-question-answering-with | 43.1 |
blip-2-bootstrapping-language-image-pre | 39.4 |
multimodal-few-shot-learning-with-frozen | 5.9 |
lako-knowledge-driven-visual-question | 42.03 |
implicit-differentiable-outlier-detection | 52.4 |
a-good-prompt-is-worth-millions-of-parameters | 16.5 |
blip-2-bootstrapping-language-image-pre | 45.9 |
lako-knowledge-driven-visual-question | 47.01 |
enabling-multimodal-generation-on-clip-via | 10.5 |
fine-grained-late-interaction-multi-modal-1 | 62.08 |
blip-2-bootstrapping-language-image-pre | 31.7 |
flamingo-a-visual-language-model-for-few-shot-1 | 41.2 |
pali-a-jointly-scaled-multilingual-language | 64.5 |
lyrics-boosting-fine-grained-language-vision | 58.2 |
revive-regional-visual-representation-matters | 56.6 |
flamingo-a-visual-language-model-for-few-shot-1 | 44.7 |
blip-2-bootstrapping-language-image-pre | 40.7 |
visual-program-distillation-distilling-tools | 66.8 |
promptcap-prompt-guided-task-aware-image | 60.4 |
pali-x-on-scaling-up-a-multilingual-vision | 66.1 |
differentiable-outlier-detection-enable | 52.4 |
fine-grained-late-interaction-multi-modal-1 | 54.85 |
retrieval-augmented-visual-question-answering | 54.48 |
reveal-retrieval-augmented-visual-language | 59.1 |
blip-2-bootstrapping-language-image-pre | 30.2 |
transform-retrieve-generate-natural-language | 50.50 |
blip-2-bootstrapping-language-image-pre | 36.4 |
hydra-a-hyper-agent-for-dynamic-compositional | 48.6 |
flamingo-a-visual-language-model-for-few-shot-1 | 50.6 |