Visual Question Answering On Vqa V2 Test Dev 1
Metriken
Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | Accuracy |
---|---|
florence-a-new-foundation-model-for-computer | 80.16 |
lxmert-model-compression-for-visual-question | 70.72 |
differentiable-outlier-detection-enable | 76.8 |
blip-2-bootstrapping-language-image-pre | 82.30 |
blip-2-bootstrapping-language-image-pre | 81.74 |
learning-to-localize-objects-improves-spatial | 56.2 |
blip-2-bootstrapping-language-image-pre | 81.66 |
unifying-architectures-tasks-and-modalities | 82.0 |
Modell 9 | 77.69 |
coca-contrastive-captioners-are-image-text | 82.3 |
mplug-2-a-modularized-multi-modal-foundation | 81.11 |