Common Sense Reasoning On Winogrande
المقاييس
Accuracy
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
جدول المقارنة
اسم النموذج | Accuracy |
---|---|
task-compass-scaling-multi-task-pre-training | 90.5 |
finetuned-language-models-are-zero-shot | 72.8 |
unifiedqa-crossing-format-boundaries-with-a | 73.3 |
efficient-language-modeling-with-sparse-all | 53.4 |
language-models-are-few-shot-learners | 57.4 |
pythia-a-suite-for-analyzing-large-language | 66.6 |
back-to-square-one-bias-detection-training | 54.9 |
designing-effective-sparse-expert-models | 81.7 |
textbooks-are-all-you-need-ii-phi-1-5 | 74.0 |
llama-open-and-efficient-foundation-language-1 | 73.0 |
bloomberggpt-a-large-language-model-for | 66.1 |
mixtral-of-experts | 74.2 |
guess-the-instruction-making-language-models | 58.56 |
gpt-4-technical-report-1 | 87.5 |
efficient-language-modeling-with-sparse-all | 51 |
exploring-the-benefits-of-training-expert | 61.60 |
pythia-a-suite-for-analyzing-large-language | 63.9 |
winogrande-an-adversarial-winograd-schema | 51.9 |
winogrande-an-adversarial-winograd-schema | 50 |
palm-2-technical-report-1 | 77.9 |
efficient-language-modeling-with-sparse-all | 51.7 |
the-claude-3-model-family-opus-sonnet-haiku | 88.5 |
bloomberggpt-a-large-language-model-for | 64.1 |
efficient-language-modeling-with-sparse-all | 54.3 |
llama-open-and-efficient-foundation-language-1 | 76.0 |
unifiedqa-crossing-format-boundaries-with-a | 89.4 |
efficient-language-modeling-with-sparse-all | 51.1 |
winogrande-an-adversarial-winograd-schema | 64.9 |
lamini-lm-a-diverse-herd-of-distilled-models | 59.9 |
palm-scaling-language-modeling-with-pathways-1 | 77.0 |
the-claude-3-model-family-opus-sonnet-haiku | 75.1 |
task-compass-scaling-multi-task-pre-training | 89.6 |
winogrande-an-adversarial-winograd-schema | 79.1 |
lamini-lm-a-diverse-herd-of-distilled-models | 56 |
winogrande-an-adversarial-winograd-schema | 51 |
back-to-square-one-bias-detection-training | 55.6 |
mixlora-enhancing-large-language-models-fine | 82.1 |
palm-2-technical-report-1 | 79.2 |
back-to-square-one-bias-detection-training | 53.1 |
palm-scaling-language-modeling-with-pathways-1 | 81.1 |
pythia-a-suite-for-analyzing-large-language | 59.4 |
branch-train-mix-mixing-expert-llms-into-a | 70.6 |
bloomberggpt-a-large-language-model-for | 67 |
llama-open-and-efficient-foundation-language-1 | 70.1 |
mixture-of-subspaces-in-low-rank-adaptation | 85.8 |
lamini-lm-a-diverse-herd-of-distilled-models | 54.9 |
training-compute-optimal-large-language | 74.9 |
designing-effective-sparse-expert-models | 96.1 |
the-cot-collection-improving-zero-shot-and | 57.5 |
llama-open-and-efficient-foundation-language-1 | 77.0 |
task-compass-scaling-multi-task-pre-training | 87 |
lamini-lm-a-diverse-herd-of-distilled-models | 55.2 |
scaling-language-models-methods-analysis-1 | 70.1 |
back-to-square-one-bias-detection-training | 56.3 |
the-claude-3-model-family-opus-sonnet-haiku | 74.2 |
back-to-square-one-bias-detection-training | 50 |
winogrande-an-adversarial-winograd-schema | 58.9 |
pythia-a-suite-for-analyzing-large-language | 60.9 |
lamini-lm-a-diverse-herd-of-distilled-models | 58.3 |
lamini-lm-a-diverse-herd-of-distilled-models | 56 |
unicorn-on-rainbow-a-universal-commonsense | 91.3 |
palm-2-technical-report-1 | 83.0 |
language-models-are-few-shot-learners | 70.2 |
back-to-square-one-bias-detection-training | 52.8 |
knowledge-in-context-towards-knowledgeable | 55.30 |
mixtral-of-experts | 77.2 |
gpt-4-technical-report-1 | 81.6 |
النموذج 68 | 70.8 |
parameter-efficient-sparsity-crafting-from | 80.9 |
back-to-square-one-bias-detection-training | 58.7 |
bloomberggpt-a-large-language-model-for | 60.6 |
finetuned-language-models-are-zero-shot | 71.2 |
mistral-7b | 75.3 |
palm-scaling-language-modeling-with-pathways-1 | 77.0 |
mixlora-enhancing-large-language-models-fine | 86.3 |
g-daug-generative-data-augmentation-for | 71.4 |
mixlora-enhancing-large-language-models-fine | 76.8 |