HyperAIHyperAI

Math Word Problem Solving On Svamp

Metrics

Execution Accuracy

Results

Performance results of various models on this benchmark

Model Name
Execution Accuracy
Paper TitleRepository
Qwen2(CoT + Code Interpreter)92.3--
GTS with RoBERTa41.0Are NLP Models really able to Solve Simple Math Word Problems?-
MMOS-CODE-7B(0-shot)76.4An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning-
OpenMath-CodeLlama-70B (w/ code)87.8OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset-
GPT-4 (Model Selection)93.7Automatic Model Selection with Large Language Models for Reasoning-
ATHENA (roberta-large)54.8ATHENA: Mathematical Reasoning with Thought Expansion-
ATHENA (roberta-base)45.6ATHENA: Mathematical Reasoning with Thought Expansion-
DeBERTa63.5Math Word Problem Solving by Generating Linguistic Variants of Problem Statements-
MsAT-DeductReasoner48.9Learning Multi-Step Reasoning by Solving Arithmetic Tasks-
LSTM Seq2Seq with RoBERTa40.3Are NLP Models really able to Solve Simple Math Word Problems?-
GPT-4 (Teaching-Inspired)93.9Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models-
PaLM (zero-shot)58.8Large Language Models are Zero-Shot Reasoners-
PaLM (zero-shot, CoT)62.1Large Language Models are Zero-Shot Reasoners-
GPT-4 DUP-Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems-
GPT-4 (PHP)91.9Progressive-Hint Prompting Improves Reasoning in Large Language Models-
Roberta-DeductReasoner47.3Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction-
SYRELM (GPT-J)40.1Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning-
LLaMA 2-Chat69.2Llama 2: Open Foundation and Fine-Tuned Chat Models-
Graph2Tree with RoBERTa43.8Are NLP Models really able to Solve Simple Math Word Problems?-
MMOS-DeepSeekMath-7B(0-shot)79.3An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning-
0 of 24 row(s) selected.
Math Word Problem Solving On Svamp | SOTA | HyperAI