Math Word Problem Solving On Svamp

Metrics

Execution Accuracy

Results

Performance results of various models on this benchmark

Model Name	Execution Accuracy	Paper Title	Repository
Qwen2(CoT + Code Interpreter)	92.3	-	-
GTS with RoBERTa	41.0	Are NLP Models really able to Solve Simple Math Word Problems?	-
MMOS-CODE-7B(0-shot)	76.4	An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning	-
OpenMath-CodeLlama-70B (w/ code)	87.8	OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset	-
GPT-4 (Model Selection)	93.7	Automatic Model Selection with Large Language Models for Reasoning	-
ATHENA (roberta-large)	54.8	ATHENA: Mathematical Reasoning with Thought Expansion	-
ATHENA (roberta-base)	45.6	ATHENA: Mathematical Reasoning with Thought Expansion	-
DeBERTa	63.5	Math Word Problem Solving by Generating Linguistic Variants of Problem Statements	-
MsAT-DeductReasoner	48.9	Learning Multi-Step Reasoning by Solving Arithmetic Tasks	-
LSTM Seq2Seq with RoBERTa	40.3	Are NLP Models really able to Solve Simple Math Word Problems?	-
GPT-4 (Teaching-Inspired)	93.9	Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models	-
PaLM (zero-shot)	58.8	Large Language Models are Zero-Shot Reasoners	-
PaLM (zero-shot, CoT)	62.1	Large Language Models are Zero-Shot Reasoners	-
GPT-4 DUP	-	Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems	-
GPT-4 (PHP)	91.9	Progressive-Hint Prompting Improves Reasoning in Large Language Models	-
Roberta-DeductReasoner	47.3	Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction	-
SYRELM (GPT-J)	40.1	Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning	-
LLaMA 2-Chat	69.2	Llama 2: Open Foundation and Fine-Tuned Chat Models	-
Graph2Tree with RoBERTa	43.8	Are NLP Models really able to Solve Simple Math Word Problems?	-
MMOS-DeepSeekMath-7B(0-shot)	79.3	An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning	-

0 of 24 row(s) selected.