Natural Language Inference On Anli Test
评估指标
A1
A2
A3
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | A1 | A2 | A3 |
---|---|---|---|
a-systematic-study-and-comprehensive | 62.3 | 52.6 | 54.1 |
language-models-are-few-shot-learners | 36.8 | 34 | 40.2 |
palm-2-technical-report-1 | 53.1 | 48.8 | 53.2 |
prompting-for-explanations-improves | 75.6 | 60.6 | 59.9 |
knowledge-in-context-towards-knowledgeable | 36.30 | 35.00 | 37.60 |
bloomberggpt-a-large-language-model-for | 33.6 | 33.8 | 35.17 |
bloomberggpt-a-large-language-model-for | 33.1 | 34.2 | 34.92 |
large-language-models-can-self-improve | - | 64.5 | 63.4 |
large-language-models-can-self-improve | - | 66.5 | 67.9 |
exploring-the-benefits-of-training-expert | 35.49 | 34.64 | 31.22 |
bloomberggpt-a-large-language-model-for | 32.6 | 33.8 | 36.17 |
prompting-for-explanations-improves | 81.8 | 72.5 | 74.8 |
infobert-improving-robustness-of-language-1 | 75 | 50.5 | 47.7 |
palm-2-technical-report-1 | 73.1 | 63.4 | 67.1 |
the-cot-collection-improving-zero-shot-and | 41.7 | 37.2 | 41.9 |
large-language-models-can-self-improve | - | 58.9 | 60.6 |
roberta-a-robustly-optimized-bert-pretraining | 72.4 | 49.8 | 44.4 |
large-language-models-can-self-improve | - | 64.8 | 66.9 |
bloomberggpt-a-large-language-model-for | 32.9 | 34.4 | 37.33 |
xlnet-generalized-autoregressive-pretraining | 70.3 | 50.9 | 49.4 |
adversarial-training-for-large-neural | 72.3 | 52.1 | 48.4 |
large-language-models-can-self-improve | - | 55.8 | 55.8 |
palm-2-technical-report-1 | 58.1 | 49.5 | 54.5 |
large-language-models-can-self-improve | - | 65.3 | 67.3 |
guess-the-instruction-making-language-models | 39.99 | 37.05 | 37.73 |