Visual Reasoning On Winoground
평가 지표
Group Score
Image Score
Text Score
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | Group Score | Image Score | Text Score |
---|---|---|---|
winoground-probing-vision-and-language-models | 4.75 | 7.25 | 23.75 |
equivariant-similarity-for-vision-language | 14.75 | 20.75 | 43.5 |
visualgptscore-visio-linguistic-reasoning | 13.3 | 15.8 | 35.8 |
incorporating-structured-representations-into | 23.3 | 28.5 | 42.8 |
cocot-contrastive-chain-of-thought-prompting | 27.75 | 32.5 | 40 |
the-role-of-chain-of-thought-in-complex | 58.75 | 68.75 | 75.25 |
cocot-contrastive-chain-of-thought-prompting | 41.5 | 55.25 | 58.25 |
what-you-see-is-what-you-read-improving-text-1 | 8.25 | 11.50 | 28.25 |
simple-token-level-confidence-improves | 7.25 | 10.25 | 30.75 |
winoground-probing-vision-and-language-models | 3.50 | 5.50 | 18.75 |
equivariant-similarity-for-vision-language | 12.00 | 15.75 | 39.25 |
cocot-contrastive-chain-of-thought-prompting | 33.25 | 41.25 | 39 |
selfeval-leveraging-the-discriminative-nature | - | 7.25 | 22.75 |
selfeval-leveraging-the-discriminative-nature | - | 8.0 | 30.25 |
measuring-progress-in-fine-grained-vision-and | 12.2 | 15.2 | 34.7 |
prompting-large-vision-language-models-for | 18.2 | 28.7 | 43.5 |
an-examination-of-the-compositionality-of | 10.50 | 17.00 | 25.50 |
what-you-see-is-what-you-read-improving-text-1 | 11.30 | 12.50 | 19.00 |
your-diffusion-model-is-secretly-a-zero-shot | - | - | 34.00 |
prompting-large-vision-language-models-for | 17.4 | 27.8 | 42.7 |
measuring-progress-in-fine-grained-vision-and | 21.5 | 26.7 | 44.0 |
equivariant-similarity-for-vision-language | 18.75 | 22.75 | 45.0 |
going-beyond-nouns-with-vision-language | 7.25 | 9.50 | 28.50 |
compositional-chain-of-thought-prompting-for | 4.0 | 16.3 | 9.3 |
winoground-probing-vision-and-language-models | 8.00 | 12.00 | 30.00 |
visualgptscore-visio-linguistic-reasoning | 16.8 | 21.5 | 36.5 |
모델 27 | 5.0 | 20.8 | 16.8 |
simple-token-level-confidence-improves | 6.75 | 15.75 | 16.50 |
vilem-visual-language-error-modeling-for | - | - | 31.2 |
winoground-probing-vision-and-language-models | 11.00 | 13.25 | 37.00 |
simple-token-level-confidence-improves | 4.50 | 7.75 | 22.75 |
measuring-progress-in-fine-grained-vision-and | 12.2 | 15.7 | 33.2 |
incorporating-structured-representations-into | 15.0 | 19.2 | 39.0 |
winoground-probing-vision-and-language-models | 8.00 | 10.50 | 30.75 |
winoground-probing-vision-and-language-models | 3.75 | 7.00 | 17.50 |
cocot-contrastive-chain-of-thought-prompting | 37.75 | 42.5 | 54.5 |
incorporating-structured-representations-into | 20.5 | 26.0 | 41.5 |
cocot-contrastive-chain-of-thought-prompting | 44.5 | 49.5 | 58.5 |
selfeval-leveraging-the-discriminative-nature | - | 12.75 | 30.75 |
cocot-contrastive-chain-of-thought-prompting | 47.5 | 48 | 51 |
모델 41 | 38.00 | 38.00 | 38.00 |
모델 42 | 8.0 | 22.5 | 18.75 |
cocot-contrastive-chain-of-thought-prompting | 25 | 26 | 30.75 |
what-you-see-is-what-you-read-improving-text-1 | 30.5 | 42.2 | 47 |
winoground-probing-vision-and-language-models | 16.67 | 25.00 | 25.00 |
selfeval-leveraging-the-discriminative-nature | - | 14.00 | 17.00 |
incorporating-structured-representations-into | 9.5 | 18.0 | 23.3 |
measuring-progress-in-fine-grained-vision-and | 21.2 | 24.5 | 46.7 |
cocot-contrastive-chain-of-thought-prompting | 50.75 | 52.5 | 64.25 |
does-structural-attention-improve | 16.00 | 19.75 | 42.50 |
incorporating-structured-representations-into | 21.5 | 27.3 | 42.8 |
incorporating-structured-representations-into | 8.0 | 10.5 | 29.5 |
does-structural-attention-improve | 12.25 | 15.25 | 35.25 |
measuring-progress-in-fine-grained-vision-and | 11.7 | 15.0 | 35.5 |
compositional-chain-of-thought-prompting-for | 8.3 | 21.3 | 21.0 |
cocot-contrastive-chain-of-thought-prompting | 39 | 47.25 | 47.5 |
incorporating-structured-representations-into | 16.5 | 20.5 | 40.3 |
going-beyond-nouns-with-vision-language | 9.50 | 11.50 | 30.00 |
incorporating-structured-representations-into | 18.5 | 24.0 | 42.5 |
an-examination-of-the-compositionality-of | 11.50 | 21.75 | 24.50 |
winoground-probing-vision-and-language-models | 2.75 | 5.00 | 20.00 |
measuring-progress-in-fine-grained-vision-and | 12.7 | 16.2 | 32.5 |
what-you-see-is-what-you-read-improving-text-1 | 28.75 | 38 | 46.5 |
winoground-probing-vision-and-language-models | 4.00 | 6.25 | 19.50 |
mmicl-empowering-vision-language-model-with | 43.00 | 44.99 | 45.50 |
what-you-see-is-what-you-read-improving-text-1 | 28.70 | 41.50 | 45.00 |
the-role-of-chain-of-thought-in-complex | 39.25 | 46.25 | 69.25 |
winoground-probing-vision-and-language-models | 9.25 | 14.00 | 34.75 |
simple-token-level-confidence-improves | 13.75 | 23.50 | 24.50 |
winoground-probing-vision-and-language-models | 10.00 | 13.25 | 32.25 |
vilem-visual-language-error-modeling-for | - | - | 36.5 |
winoground-probing-vision-and-language-models | 4.00 | 8.00 | 22.75 |
incorporating-structured-representations-into | 9.8 | 14.0 | 32.0 |
winoground-probing-vision-and-language-models | 14.50 | 17.75 | 37.75 |
prompting-large-vision-language-models-for | 12.4 | 24.6 | 30.3 |
equivariant-similarity-for-vision-language | 22.25 | 25.75 | 46.25 |
incorporating-structured-representations-into | 19.0 | 23.8 | 42.0 |
winoground-probing-vision-and-language-models | 10.50 | 14.00 | 38.00 |
an-examination-of-the-compositionality-of | 9.50 | 18.00 | 23.25 |
measuring-progress-in-fine-grained-vision-and | 14.5 | 18.5 | 36.5 |
an-examination-of-the-compositionality-of | 2.25 | 5.25 | 13.50 |
compositional-chain-of-thought-prompting-for | 12.3 | 22.5 | 28.0 |
going-beyond-nouns-with-vision-language | 8.25 | 10.75 | 30.00 |
selfeval-leveraging-the-discriminative-nature | - | 13.50 | 29.00 |
does-structural-attention-improve | 14.25 | 17.75 | 39.25 |
winoground-probing-vision-and-language-models | 4.00 | 7.00 | 19.25 |
simple-token-level-confidence-improves | 17.50 | 27.00 | 29.25 |
winoground-probing-vision-and-language-models | 9.00 | 13.50 | 25.25 |
measuring-progress-in-fine-grained-vision-and | 12.2 | 14.5 | 34.7 |
what-you-see-is-what-you-read-improving-text-1 | 23.50 | 26.00 | 44.00 |
incorporating-structured-representations-into | 13.0 | 25.0 | 24.8 |
winoground-probing-vision-and-language-models | 4.50 | 6.25 | 19.75 |
winoground-probing-vision-and-language-models | 14.25 | 20.50 | 32.25 |
compositional-chain-of-thought-prompting-for | 20.1 | 33.3 | 36.0 |
does-structural-attention-improve | 15.50 | 19.75 | 41.75 |
winoground-probing-vision-and-language-models | 1.50 | 2.50 | 15.50 |
incorporating-structured-representations-into | 19.0 | 25.5 | 40.5 |
an-examination-of-the-compositionality-of | 2.75 | 8.00 | 14.00 |
compositional-chain-of-thought-prompting-for | 3.3 | 11.5 | 7.0 |
equivariant-similarity-for-vision-language | 23.00 | 26.50 | 51.25 |
selfeval-leveraging-the-discriminative-nature | - | 12.00 | 28.25 |
compositional-chain-of-thought-prompting-for | 22.3 | 35.5 | 42.0 |
measuring-progress-in-fine-grained-vision-and | 11.0 | 15.5 | 29.2 |
cocot-contrastive-chain-of-thought-prompting | 36.75 | 45 | 46.75 |
cocot-contrastive-chain-of-thought-prompting | 20 | 27.5 | 42.5 |
cocot-contrastive-chain-of-thought-prompting | 23.75 | 25 | 45 |
cocot-contrastive-chain-of-thought-prompting | 20.75 | 33 | 22.5 |
equivariant-similarity-for-vision-language | 27.5 | 32.00 | 51.5 |
what-you-see-is-what-you-read-improving-text-1 | 10.25 | 13.75 | 26.50 |
what-you-see-is-what-you-read-improving-text-1 | 9.00 | 14.30 | 27.70 |
visualgptscore-visio-linguistic-reasoning | 6.5 | 9.0 | 28.0 |
simple-token-level-confidence-improves | 6.50 | 10.75 | 26.75 |
winoground-probing-vision-and-language-models | 3.50 | 5.00 | 20.00 |