HyperAI

Visual Reasoning On Nlvr2 Dev

Metrics

Accuracy

Results

Performance results of various models on this benchmark

Comparison Table
Model NameAccuracy
toward-building-general-foundation-models-for87.6
vlmo-unified-vision-language-pre-training85.64
visualbert-a-simple-and-performant-baseline66.7
differentiable-outlier-detection-enable83.9
coca-contrastive-captioners-are-image-text86.1
multi-grained-vision-language-pre-training84.41
x-2-vlm-all-in-one-pre-trained-model-for88.7
seeing-out-of-the-box-end-to-end-pre-training76.37
align-before-fuse-vision-and-language83.14
implicit-differentiable-outlier-detection84.6
vilt-vision-and-language-transformer-without75.7
simvlm-simple-visual-language-model84.53
image-as-a-foreign-language-beit-pretraining91.51
x-2-vlm-all-in-one-pre-trained-model-for86.2
lxmert-learning-cross-modality-encoder74.9