Arithmetic Reasoning On Gsm8K
المقاييس
Accuracy
Parameters (Billion)
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
جدول المقارنة
اسم النموذج | Accuracy | Parameters (Billion) |
---|---|---|
lever-learning-to-verify-language-to-code | 84.5 | 175 |
composing-ensembles-of-pre-trained-models-via | 16.8 | 0.355 |
tora-a-tool-integrated-reasoning-agent-for | 72.6 | 7 |
openmathinstruct-1-a-1-8-million-math | 75.9 | 7 |
gemini-a-family-of-highly-capable-multimodal-1 | 86.5 | - |
transcending-scaling-laws-with-0-1-extra | 58.5 | 540 |
mathcoder-seamless-code-integration-in-llms | 74.1 | 7 |
openmathinstruct-2-accelerating-ai-for-math | 94.9 | - |
the-art-of-llm-refinement-ask-refine-and | 82.6 | - |
parameter-efficient-sparsity-crafting-from | 78.3 | - |
orca-math-unlocking-the-potential-of-slms-in | 86.8 | 7 |
large-language-models-can-self-improve | 74.4 | 540 |
llama-open-and-efficient-foundation-language-1 | 17.8 | 13 |
an-empirical-study-of-data-ability-boundary | 73.9 | 7 |
outcome-supervised-verifiers-for-planning-in | 84.7 | 7 |
solving-quantitative-reasoning-problems-with | 89 | 62 |
النموذج 17 | 95.2 | 75 |
on-the-advance-of-making-language-models | 83.2 | 175 |
llama-open-and-efficient-foundation-language-1 | 53.1 | 33 |
dart-math-difficulty-aware-rejection-tuning-1 | 82.5 | 8 |
model-card-and-evaluations-for-claude-models | 85.2 | - |
sparks-of-artificial-general-intelligence | 87.1 | - |
wizardmath-empowering-mathematical-reasoning | 63.9 | 13 |
large-language-models-can-self-improve | 32.2 | 540 |
wizardmath-empowering-mathematical-reasoning | 83.2 | 7 |
openmathinstruct-1-a-1-8-million-math | 86.8 | 13 |
palm-2-technical-report-1 | 80.7 | - |
the-claude-3-model-family-opus-sonnet-haiku | 92.3 | - |
large-language-models-are-zero-shot-reasoners | 51.5 | 175 |
orca-2-teaching-small-language-models-how-to | 59.14 | 13 |
scaling-relationship-on-learning-mathematical | 51.2 | 7 |
solving-math-word-problems-with-process-and | 76.5 | 70 |
palm-2-technical-report-1 | 91.0 | - |
النموذج 34 | 80.2 | - |
openmathinstruct-2-accelerating-ai-for-math | 96.0 | - |
solving-math-word-problems-with-process-and | 87.1 | 70 |
النموذج 37 | 66.9 | - |
mathcoder-seamless-code-integration-in-llms | 67.8 | 7 |
tora-a-tool-integrated-reasoning-agent-for | 84.3 | 70 |
an-empirical-study-of-data-ability-boundary | 80.5 | 7 |
large-language-models-are-zero-shot-reasoners | 17.9 | 540 |
dart-math-difficulty-aware-rejection-tuning-1 | 82.6 | 7 |
solving-quantitative-reasoning-problems-with | 56.5 | 540 |
large-language-models-are-zero-shot-reasoners | 55.0 | 175 |
scaling-relationship-on-learning-mathematical | 64.8 | 79 |
النموذج 46 | 96.4 | 7 |
metamath-bootstrap-your-own-mathematical | 82.3 | 70 |
large-language-models-are-zero-shot-reasoners | 40.7 | 175 |
hierarchical-prompting-taxonomy-a-universal | 97.72 | - |
large-language-models-can-self-improve | 82.1 | 540 |
openmathinstruct-1-a-1-8-million-math | 90.8 | 70 |
النموذج 52 | 71.9 | - |
least-to-most-prompting-enables-complex | 68.01 | 175 |
dart-math-difficulty-aware-rejection-tuning-1 | 90.4 | 70 |
teaching-inspired-integrated-prompting | 94.8 | - |
solving-quantitative-reasoning-problems-with | 28.4 | 8 |
openchat-advancing-open-source-language | 77.3 | 7 |
large-language-models-are-zero-shot-reasoners | 10.4 | 175 |
model-card-and-evaluations-for-claude-models | 88 | - |
composing-ensembles-of-pre-trained-models-via | 18.3 | 0.355 |
openmathinstruct-1-a-1-8-million-math | 80.7 | 34 |
النموذج 62 | 76.4 | 7 |
openmathinstruct-1-a-1-8-million-math | 84.6 | 70 |
openmathinstruct-1-a-1-8-million-math | 78.8 | 13 |
wizardmath-empowering-mathematical-reasoning | 81.6 | 70 |
orca-2-teaching-small-language-models-how-to | 47.23 | 7 |
solving-quantitative-reasoning-problems-with | 56.8 | 8 |
tora-a-tool-integrated-reasoning-agent-for | 80.7 | 34 |
dart-math-difficulty-aware-rejection-tuning-1 | 88.2 | 7 |
deepseekmath-pushing-the-limits-of | 88.2 | 7 |
an-empirical-study-of-data-ability-boundary | 80.4 | 34 |
openmathinstruct-1-a-1-8-million-math | 90.1 | 70 |
llama-open-and-efficient-foundation-language-1 | 11.0 | 7 |
the-claude-3-model-family-opus-sonnet-haiku | 88.9 | - |
qwen2-technical-report | 96.7 | 72 |
large-language-models-are-zero-shot-reasoners | 58.1 | 540 |
solving-quantitative-reasoning-problems-with | 52.4 | 62 |
composing-ensembles-of-pre-trained-models-via | 12.2 | 0.355 |
large-language-models-can-self-improve | 56.5 | 540 |
openmathinstruct-2-accelerating-ai-for-math | 91.7 | - |
the-unreasonable-effectiveness-of-eccentric | 43 | 13 |
solving-quantitative-reasoning-problems-with | 4.1 | 8 |
llama-open-and-efficient-foundation-language-1 | 35.6 | 33 |
openmathinstruct-2-accelerating-ai-for-math | 94.1 | - |
النموذج 85 | 93.2 | 7 |
النموذج 86 | 89.0 | 13 |
gpt-4-technical-report-1 | 57.1 | - |
llama-open-and-efficient-foundation-language-1 | 18.1 | 7 |
النموذج 89 | 94.13 | 7 |
frugal-lms-trained-to-invoke-symbolic-solvers | 35.2 | 13 |
openmathinstruct-1-a-1-8-million-math | 84.8 | 7 |
kwaiyiimath-technical-report | 73.3 | 13 |
dart-math-difficulty-aware-rejection-tuning-1 | 81.1 | 7 |
unifying-language-learning-paradigms | 4.4 | 20 |
openmathinstruct-1-a-1-8-million-math | 80.2 | 7 |
large-language-models-can-self-improve | 73.5 | 540 |
tora-a-tool-integrated-reasoning-agent-for | 75.8 | 13 |
solving-quantitative-reasoning-problems-with | 16.2 | 8 |
tinygsm-achieving-80-on-gsm8k-with-small | 81.5 | 2.6 |
large-language-models-are-zero-shot-reasoners | 41.3 | 175 |
llama-2-open-foundation-and-fine-tuned-chat | 56.8 | 70 |
the-unreasonable-effectiveness-of-eccentric | 41 | 7 |
learning-from-self-sampled-correct-and | 19.5 | 2.7 |
mathcoder-seamless-code-integration-in-llms | 72.6 | 13 |
math-shepherd-a-label-free-step-by-step | 89.1 | 7 |
scaling-relationship-on-learning-mathematical | 55.3 | 13 |
outcome-supervised-verifiers-for-planning-in | 73.7 | 7 |
llama-open-and-efficient-foundation-language-1 | 50.9 | 65 |
metamath-bootstrap-your-own-mathematical | 77.7 | 7 |
large-language-models-can-self-improve | 17.9 | 540 |
codet5-open-code-large-language-models-for | 73.8 | 0.77 |
query-and-response-augmentation-cannot-help | 69.8 | 7 |
solving-math-word-problems-with-process-and | 87.3 | 70 |
النموذج 114 | 95.1 | 7 |
metamath-bootstrap-your-own-mathematical | 66.4 | 7 |
openmathinstruct-1-a-1-8-million-math | 84.7 | 70 |
model-card-and-evaluations-for-claude-models | 80.9 | - |
math-shepherd-a-label-free-step-by-step | 84.1 | 7 |
self-consistency-improves-chain-of-thought | 74.4 | 540 |
query-and-response-augmentation-cannot-help | 82.3 | 70 |
llemma-an-open-language-model-for-mathematics | 36.4 | 7 |
solving-quantitative-reasoning-problems-with | 78.5 | 540 |
llemma-an-open-language-model-for-mathematics | 51.5 | 34 |
metamath-bootstrap-your-own-mathematical | 71.0 | 13 |
mathcoder-seamless-code-integration-in-llms | 83.9 | 70 |
query-and-response-augmentation-cannot-help | 74 | 13 |
the-unreasonable-effectiveness-of-eccentric | 61 | 70 |
solving-quantitative-reasoning-problems-with | 33.0 | 62 |
the-claude-3-model-family-opus-sonnet-haiku | 95 | - |
toward-self-improvement-of-llms-via | 92 | 70 |
mistral-7b | 52.2 | 7 |
llama-open-and-efficient-foundation-language-1 | 29.3 | 13 |
unifying-language-learning-paradigms | 4.1 | 20 |
learning-from-self-sampled-correct-and | 7.5 | 0.125 |
step-dpo-step-wise-preference-optimization | 94.0 | - |
النموذج 136 | 74.7 | 7 |
openmathinstruct-1-a-1-8-million-math | 86.9 | 7 |
composing-ensembles-of-pre-trained-models-via | 20.8 | 0.355 |
breaking-the-ceiling-of-the-llm-community-by | 90.91 | - |
tinygsm-achieving-80-on-gsm8k-with-small | 74.3 | 2.7 |
solving-quantitative-reasoning-problems-with | 68.5 | 62 |
tora-a-tool-integrated-reasoning-agent-for | 88.3 | 70 |
النموذج 143 | 72.3 | - |
wizardmath-empowering-mathematical-reasoning | 54.9 | 7 |
llama-open-and-efficient-foundation-language-1 | 69.7 | 65 |
dart-math-difficulty-aware-rejection-tuning-1 | 81.1 | 8 |
mathcoder-seamless-code-integration-in-llms | 81.7 | 34 |
solving-math-word-problem-via-cooperative | 63.2 | 12 |
النموذج 149 | 85.5 | - |
boosting-llm-reasoning-push-the-limits-of-few | 59.59 | 70 |
parameter-efficient-sparsity-crafting-from | 77.8 | - |
branch-train-mix-mixing-expert-llms-into-a | 37.1 | - |
openmathinstruct-1-a-1-8-million-math | 88.0 | 34 |
an-empirical-study-of-data-ability-boundary | 87.2 | 7 |
mathcoder-seamless-code-integration-in-llms | 64.2 | 7 |
tora-a-tool-integrated-reasoning-agent-for | 85.1 | 34 |
النموذج 157 | 87.41 | 4 |
dart-math-difficulty-aware-rejection-tuning-1 | 89.6 | 70 |
dart-math-difficulty-aware-rejection-tuning-1 | 86.8 | 7 |
outcome-supervised-verifiers-for-planning-in | 82.6 | 7 |