Visual Question Answering On Vqa V2 Test Dev 1
Métriques
Accuracy
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | Accuracy |
---|---|
florence-a-new-foundation-model-for-computer | 80.16 |
lxmert-model-compression-for-visual-question | 70.72 |
differentiable-outlier-detection-enable | 76.8 |
blip-2-bootstrapping-language-image-pre | 82.30 |
blip-2-bootstrapping-language-image-pre | 81.74 |
learning-to-localize-objects-improves-spatial | 56.2 |
blip-2-bootstrapping-language-image-pre | 81.66 |
unifying-architectures-tasks-and-modalities | 82.0 |
Modèle 9 | 77.69 |
coca-contrastive-captioners-are-image-text | 82.3 |
mplug-2-a-modularized-multi-modal-foundation | 81.11 |