HyperAI초신경

Natural Language Inference On Anli Test

평가 지표

A1
A2
A3

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름A1A2A3
a-systematic-study-and-comprehensive62.352.654.1
language-models-are-few-shot-learners36.83440.2
palm-2-technical-report-153.148.853.2
prompting-for-explanations-improves75.660.659.9
knowledge-in-context-towards-knowledgeable36.3035.0037.60
bloomberggpt-a-large-language-model-for33.633.835.17
bloomberggpt-a-large-language-model-for33.134.234.92
large-language-models-can-self-improve-64.563.4
large-language-models-can-self-improve-66.567.9
exploring-the-benefits-of-training-expert35.4934.6431.22
bloomberggpt-a-large-language-model-for32.633.836.17
prompting-for-explanations-improves81.872.574.8
infobert-improving-robustness-of-language-17550.547.7
palm-2-technical-report-173.163.467.1
the-cot-collection-improving-zero-shot-and41.737.241.9
large-language-models-can-self-improve-58.960.6
roberta-a-robustly-optimized-bert-pretraining72.449.844.4
large-language-models-can-self-improve-64.866.9
bloomberggpt-a-large-language-model-for32.934.437.33
xlnet-generalized-autoregressive-pretraining70.350.949.4
adversarial-training-for-large-neural72.352.148.4
large-language-models-can-self-improve-55.855.8
palm-2-technical-report-158.149.554.5
large-language-models-can-self-improve-65.367.3
guess-the-instruction-making-language-models39.9937.0537.73