Visual Reasoning On Nlvr2 Test
Métriques
Accuracy
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | Accuracy |
---|---|
coca-contrastive-captioners-are-image-text | 87.0 |
uniter-learning-universal-image-text-1 | 79.5 |
simvlm-simple-visual-language-model | 85.15 |
vlmo-unified-vision-language-pre-training | 86.86 |
blip-bootstrapping-language-image-pre | 83.09 |
x-2-vlm-all-in-one-pre-trained-model-for | 89.4 |
x-2-vlm-all-in-one-pre-trained-model-for | 87.0 |
multi-grained-vision-language-pre-training | 84.76 |
seeing-out-of-the-box-end-to-end-pre-training | 77.32 |
lxmert-learning-cross-modality-encoder | 76.2 |
vilt-vision-and-language-transformer-without | 76.13 |
align-before-fuse-vision-and-language | 82.55 |
image-as-a-foreign-language-beit-pretraining | 92.58 |
toward-building-general-foundation-models-for | 88.4 |