HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Math Word Problem Solving
Math Word Problem Solving On Svamp
Math Word Problem Solving On Svamp
评估指标
Execution Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Execution Accuracy
Paper Title
Repository
Qwen2(CoT + Code Interpreter)
92.3
-
-
GTS with RoBERTa
41.0
Are NLP Models really able to Solve Simple Math Word Problems?
MMOS-CODE-7B(0-shot)
76.4
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
OpenMath-CodeLlama-70B (w/ code)
87.8
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
GPT-4 (Model Selection)
93.7
Automatic Model Selection with Large Language Models for Reasoning
ATHENA (roberta-large)
54.8
ATHENA: Mathematical Reasoning with Thought Expansion
ATHENA (roberta-base)
45.6
ATHENA: Mathematical Reasoning with Thought Expansion
DeBERTa
63.5
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements
MsAT-DeductReasoner
48.9
Learning Multi-Step Reasoning by Solving Arithmetic Tasks
LSTM Seq2Seq with RoBERTa
40.3
Are NLP Models really able to Solve Simple Math Word Problems?
GPT-4 (Teaching-Inspired)
93.9
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models
-
PaLM (zero-shot)
58.8
Large Language Models are Zero-Shot Reasoners
PaLM (zero-shot, CoT)
62.1
Large Language Models are Zero-Shot Reasoners
GPT-4 DUP
-
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems
GPT-4 (PHP)
91.9
Progressive-Hint Prompting Improves Reasoning in Large Language Models
Roberta-DeductReasoner
47.3
Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction
SYRELM (GPT-J)
40.1
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning
LLaMA 2-Chat
69.2
Llama 2: Open Foundation and Fine-Tuned Chat Models
Graph2Tree with RoBERTa
43.8
Are NLP Models really able to Solve Simple Math Word Problems?
MMOS-DeepSeekMath-7B(0-shot)
79.3
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
0 of 24 row(s) selected.
Previous
Next