Math Word Problem Solving On Svamp
Metrics
Execution Accuracy
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | Execution Accuracy |
---|---|
Model 1 | 92.3 |
are-nlp-models-really-able-to-solve-simple | 41.0 |
an-empirical-study-of-data-ability-boundary | 76.4 |
openmathinstruct-1-a-1-8-million-math | 87.8 |
automatic-model-selection-with-large-language | 93.7 |
athena-mathematical-reasoning-with-thought | 54.8 |
athena-mathematical-reasoning-with-thought | 45.6 |
math-word-problem-solving-by-generating | 63.5 |
learning-multi-step-reasoning-from-arithmetic | 48.9 |
are-nlp-models-really-able-to-solve-simple | 40.3 |
teaching-inspired-integrated-prompting | 93.9 |
large-language-models-are-zero-shot-reasoners | 58.8 |
large-language-models-are-zero-shot-reasoners | 62.1 |
achieving-97-on-gsm8k-deeply-understanding | - |
progressive-hint-prompting-improves-reasoning | 91.9 |
learning-to-reason-deductively-math-word | 47.3 |
frugal-lms-trained-to-invoke-symbolic-solvers | 40.1 |
llama-2-open-foundation-and-fine-tuned-chat | 69.2 |
are-nlp-models-really-able-to-solve-simple | 43.8 |
an-empirical-study-of-data-ability-boundary | 79.3 |
mathcoder-seamless-code-integration-in-llms | 84.9 |
are-nlp-models-really-able-to-solve-simple | 38.9 |
frugal-lms-trained-to-invoke-symbolic-solvers | 56.65 |
an-empirical-study-of-data-ability-boundary | 80.6 |