Math Word Problem Solving On Svamp

평가 지표

Execution Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

		Paper Title
GPT-4 (Teaching-Inspired)	93.9	Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models
GPT-4 (Model Selection)	93.7	Automatic Model Selection with Large Language Models for Reasoning
Qwen2(CoT + Code Interpreter)	92.3	-
GPT-4 (PHP)	91.9	Progressive-Hint Prompting Improves Reasoning in Large Language Models
OpenMath-CodeLlama-70B (w/ code)	87.8	OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
MathCoder-L-70B	84.9	MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
MMOS-CODE-34B(0-shot)	80.6	An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
MMOS-DeepSeekMath-7B(0-shot)	79.3	An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
MMOS-CODE-7B(0-shot)	76.4	An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
LLaMA 2-Chat	69.2	Llama 2: Open Foundation and Fine-Tuned Chat Models
DeBERTa	63.5	Math Word Problem Solving by Generating Linguistic Variants of Problem Statements
PaLM (zero-shot, CoT)	62.1	Large Language Models are Zero-Shot Reasoners
PaLM (zero-shot)	58.8	Large Language Models are Zero-Shot Reasoners
SYRELM (Vicuna 13B)	56.65	Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning
ATHENA (roberta-large)	54.8	ATHENA: Mathematical Reasoning with Thought Expansion
MsAT-DeductReasoner	48.9	Learning Multi-Step Reasoning by Solving Arithmetic Tasks
Roberta-DeductReasoner	47.3	Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction
ATHENA (roberta-base)	45.6	ATHENA: Mathematical Reasoning with Thought Expansion
Graph2Tree with RoBERTa	43.8	Are NLP Models really able to Solve Simple Math Word Problems?
GTS with RoBERTa	41.0	Are NLP Models really able to Solve Simple Math Word Problems?

0 of 24 row(s) selected.

Command Palette

Math Word Problem Solving On Svamp

평가 지표

평가 결과