Common Sense Reasoning On Arc Challenge
المقاييس
Accuracy
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
جدول المقارنة
اسم النموذج | Accuracy |
---|---|
model-card-and-evaluations-for-claude-models | 91 |
large-language-models-can-self-improve | 88.3 |
gpt-4-technical-report-1 | 96.4 |
parameter-efficient-sparsity-crafting-from | 65.2 |
النموذج 5 | 91.03 |
mixture-of-subspaces-in-low-rank-adaptation | 81.5 |
large-language-models-can-self-improve | 85.2 |
glam-efficient-scaling-of-language-models | 50.3 |
large-language-models-can-self-improve | 87.1 |
designing-effective-sparse-expert-models | 86.5 |
palm-2-technical-report-1 | 59.6 |
galactica-a-large-language-model-for-science-1 | 32.9 |
galactica-a-large-language-model-for-science-1 | 67.9 |
galactica-a-large-language-model-for-science-1 | 31.1 |
mixlora-enhancing-large-language-models-fine | 58.1 |
massive-language-models-can-be-accurately | 25.6 |
glam-efficient-scaling-of-language-models | 48.2 |
mixlora-enhancing-large-language-models-fine | 69.9 |
mixlora-enhancing-large-language-models-fine | 79.9 |
palm-2-technical-report-1 | 95.1 |
massive-language-models-can-be-accurately | 43.94 |
language-models-are-few-shot-learners | 51.4 |
large-language-models-can-self-improve | 89.8 |
palm-2-technical-report-1 | 64.9 |
unifying-language-learning-paradigms | 49.5 |
llama-open-and-efficient-foundation-language-1 | 56.0 |
النموذج 27 | 91.04 |
unifying-language-learning-paradigms | 29.8 |
massive-language-models-can-be-accurately | 38.99 |
palm-2-technical-report-1 | 69.2 |
llama-open-and-efficient-foundation-language-1 | 47.6 |
pythia-a-suite-for-analyzing-large-language | 36.8 |
massive-language-models-can-be-accurately | 41.3 |
pythia-a-suite-for-analyzing-large-language | 31.8 |
massive-language-models-can-be-accurately | 39.85 |
large-language-models-can-self-improve | 87.2 |
finetuned-language-models-are-zero-shot | 63.1 |
llama-open-and-efficient-foundation-language-1 | 52.7 |
model-card-and-evaluations-for-claude-models | 85.7 |
designing-effective-sparse-expert-models | 56.9 |
llama-open-and-efficient-foundation-language-1 | 57.8 |
bloomberggpt-a-large-language-model-for | 50.85 |
model-card-and-evaluations-for-claude-models | 90 |
large-language-models-can-self-improve | 88.7 |
language-models-are-few-shot-learners | 53.2 |
bloomberggpt-a-large-language-model-for | 48.63 |
unifying-language-learning-paradigms | 42.9 |
bloomberggpt-a-large-language-model-for | 45.39 |
finetuned-language-models-are-zero-shot | 63.8 |
galactica-a-large-language-model-for-science-1 | 51.4 |
mistral-7b | 55.5 |
gpt-4-technical-report-1 | 85.2 |
bloomberggpt-a-large-language-model-for | 44.54 |
textbooks-are-all-you-need-ii-phi-1-5 | 44.9 |