HyperAI

Math Word Problem Solving On Svamp

Metrics

Execution Accuracy

Results

Performance results of various models on this benchmark

Comparison Table
Model NameExecution Accuracy
Model 192.3
are-nlp-models-really-able-to-solve-simple41.0
an-empirical-study-of-data-ability-boundary76.4
openmathinstruct-1-a-1-8-million-math87.8
automatic-model-selection-with-large-language93.7
athena-mathematical-reasoning-with-thought54.8
athena-mathematical-reasoning-with-thought45.6
math-word-problem-solving-by-generating63.5
learning-multi-step-reasoning-from-arithmetic48.9
are-nlp-models-really-able-to-solve-simple40.3
teaching-inspired-integrated-prompting93.9
large-language-models-are-zero-shot-reasoners58.8
large-language-models-are-zero-shot-reasoners62.1
achieving-97-on-gsm8k-deeply-understanding-
progressive-hint-prompting-improves-reasoning91.9
learning-to-reason-deductively-math-word47.3
frugal-lms-trained-to-invoke-symbolic-solvers40.1
llama-2-open-foundation-and-fine-tuned-chat69.2
are-nlp-models-really-able-to-solve-simple43.8
an-empirical-study-of-data-ability-boundary79.3
mathcoder-seamless-code-integration-in-llms84.9
are-nlp-models-really-able-to-solve-simple38.9
frugal-lms-trained-to-invoke-symbolic-solvers56.65
an-empirical-study-of-data-ability-boundary80.6