Visual Question Answering On Gqa Test Dev
Métriques
Accuracy
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | Accuracy |
---|---|
blip-2-bootstrapping-language-image-pre | 34.6 |
blip-2-bootstrapping-language-image-pre | 44.7 |
plug-and-play-vqa-zero-shot-vqa-by-conjoining | 41.9 |
visual-program-distillation-distilling-tools | 67.3 |
lxmert-learning-cross-modality-encoder | 60.0 |
blip-2-bootstrapping-language-image-pre | 44.4 |
a-good-prompt-is-worth-millions-of-parameters | 29.3 |
blip-2-bootstrapping-language-image-pre | 36.4 |
hydra-a-hyper-agent-for-dynamic-compositional | 47.9 |
learning-by-abstraction-the-neural-state | 62.95 |
lyrics-boosting-fine-grained-language-vision | 62.4 |
blip-2-bootstrapping-language-image-pre | 33.9 |
language-conditioned-graph-networks-for | 55.8 |
coarse-to-fine-reasoning-for-visual-question | 72.1 |
blip-2-bootstrapping-language-image-pre | 44.2 |
video-lavit-unified-video-language-pre | 64.4 |
cumo-scaling-multimodal-llm-with-co-upcycled | 64.9 |