HyperAI超神経

Visual Reasoning On Winoground

評価指標

Group Score
Image Score
Text Score

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

比較表
モデル名Group ScoreImage ScoreText Score
winoground-probing-vision-and-language-models4.757.2523.75
equivariant-similarity-for-vision-language14.7520.7543.5
visualgptscore-visio-linguistic-reasoning13.315.835.8
incorporating-structured-representations-into23.328.542.8
cocot-contrastive-chain-of-thought-prompting27.7532.540
the-role-of-chain-of-thought-in-complex58.7568.7575.25
cocot-contrastive-chain-of-thought-prompting41.555.2558.25
what-you-see-is-what-you-read-improving-text-18.2511.5028.25
simple-token-level-confidence-improves7.2510.2530.75
winoground-probing-vision-and-language-models3.505.5018.75
equivariant-similarity-for-vision-language12.0015.7539.25
cocot-contrastive-chain-of-thought-prompting33.2541.2539
selfeval-leveraging-the-discriminative-nature-7.2522.75
selfeval-leveraging-the-discriminative-nature-8.030.25
measuring-progress-in-fine-grained-vision-and12.215.234.7
prompting-large-vision-language-models-for18.228.743.5
an-examination-of-the-compositionality-of10.5017.0025.50
what-you-see-is-what-you-read-improving-text-111.3012.5019.00
your-diffusion-model-is-secretly-a-zero-shot--34.00
prompting-large-vision-language-models-for17.427.842.7
measuring-progress-in-fine-grained-vision-and21.526.744.0
equivariant-similarity-for-vision-language18.7522.7545.0
going-beyond-nouns-with-vision-language7.259.5028.50
compositional-chain-of-thought-prompting-for4.016.39.3
winoground-probing-vision-and-language-models8.0012.0030.00
visualgptscore-visio-linguistic-reasoning16.821.536.5
モデル 275.020.816.8
simple-token-level-confidence-improves6.7515.7516.50
vilem-visual-language-error-modeling-for--31.2
winoground-probing-vision-and-language-models11.0013.2537.00
simple-token-level-confidence-improves4.507.7522.75
measuring-progress-in-fine-grained-vision-and12.215.733.2
incorporating-structured-representations-into15.019.239.0
winoground-probing-vision-and-language-models8.0010.5030.75
winoground-probing-vision-and-language-models3.757.0017.50
cocot-contrastive-chain-of-thought-prompting37.7542.554.5
incorporating-structured-representations-into20.526.041.5
cocot-contrastive-chain-of-thought-prompting44.549.558.5
selfeval-leveraging-the-discriminative-nature-12.7530.75
cocot-contrastive-chain-of-thought-prompting47.54851
モデル 4138.0038.0038.00
モデル 428.022.518.75
cocot-contrastive-chain-of-thought-prompting252630.75
what-you-see-is-what-you-read-improving-text-130.542.247
winoground-probing-vision-and-language-models16.6725.0025.00
selfeval-leveraging-the-discriminative-nature-14.0017.00
incorporating-structured-representations-into9.518.023.3
measuring-progress-in-fine-grained-vision-and21.224.546.7
cocot-contrastive-chain-of-thought-prompting50.7552.564.25
does-structural-attention-improve16.0019.7542.50
incorporating-structured-representations-into21.527.342.8
incorporating-structured-representations-into8.010.529.5
does-structural-attention-improve12.2515.2535.25
measuring-progress-in-fine-grained-vision-and11.715.035.5
compositional-chain-of-thought-prompting-for8.321.321.0
cocot-contrastive-chain-of-thought-prompting3947.2547.5
incorporating-structured-representations-into16.520.540.3
going-beyond-nouns-with-vision-language9.5011.5030.00
incorporating-structured-representations-into18.524.042.5
an-examination-of-the-compositionality-of11.5021.7524.50
winoground-probing-vision-and-language-models2.755.0020.00
measuring-progress-in-fine-grained-vision-and12.716.232.5
what-you-see-is-what-you-read-improving-text-128.753846.5
winoground-probing-vision-and-language-models4.006.2519.50
mmicl-empowering-vision-language-model-with43.0044.9945.50
what-you-see-is-what-you-read-improving-text-128.7041.5045.00
the-role-of-chain-of-thought-in-complex39.2546.2569.25
winoground-probing-vision-and-language-models9.2514.0034.75
simple-token-level-confidence-improves13.7523.5024.50
winoground-probing-vision-and-language-models10.0013.2532.25
vilem-visual-language-error-modeling-for--36.5
winoground-probing-vision-and-language-models4.008.0022.75
incorporating-structured-representations-into9.814.032.0
winoground-probing-vision-and-language-models14.5017.7537.75
prompting-large-vision-language-models-for12.424.630.3
equivariant-similarity-for-vision-language22.2525.7546.25
incorporating-structured-representations-into19.023.842.0
winoground-probing-vision-and-language-models10.5014.0038.00
an-examination-of-the-compositionality-of9.5018.0023.25
measuring-progress-in-fine-grained-vision-and14.518.536.5
an-examination-of-the-compositionality-of2.255.2513.50
compositional-chain-of-thought-prompting-for12.322.528.0
going-beyond-nouns-with-vision-language8.2510.7530.00
selfeval-leveraging-the-discriminative-nature-13.5029.00
does-structural-attention-improve14.2517.7539.25
winoground-probing-vision-and-language-models4.007.0019.25
simple-token-level-confidence-improves17.5027.0029.25
winoground-probing-vision-and-language-models9.0013.5025.25
measuring-progress-in-fine-grained-vision-and12.214.534.7
what-you-see-is-what-you-read-improving-text-123.5026.0044.00
incorporating-structured-representations-into13.025.024.8
winoground-probing-vision-and-language-models4.506.2519.75
winoground-probing-vision-and-language-models14.2520.5032.25
compositional-chain-of-thought-prompting-for20.133.336.0
does-structural-attention-improve15.5019.7541.75
winoground-probing-vision-and-language-models1.502.5015.50
incorporating-structured-representations-into19.025.540.5
an-examination-of-the-compositionality-of2.758.0014.00
compositional-chain-of-thought-prompting-for3.311.57.0
equivariant-similarity-for-vision-language23.0026.5051.25
selfeval-leveraging-the-discriminative-nature-12.0028.25
compositional-chain-of-thought-prompting-for22.335.542.0
measuring-progress-in-fine-grained-vision-and11.015.529.2
cocot-contrastive-chain-of-thought-prompting36.754546.75
cocot-contrastive-chain-of-thought-prompting2027.542.5
cocot-contrastive-chain-of-thought-prompting23.752545
cocot-contrastive-chain-of-thought-prompting20.753322.5
equivariant-similarity-for-vision-language27.532.0051.5
what-you-see-is-what-you-read-improving-text-110.2513.7526.50
what-you-see-is-what-you-read-improving-text-19.0014.3027.70
visualgptscore-visio-linguistic-reasoning6.59.028.0
simple-token-level-confidence-improves6.5010.7526.75
winoground-probing-vision-and-language-models3.505.0020.00