HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
Math Word Problem Solving
Math Word Problem Solving On Svamp
Math Word Problem Solving On Svamp
평가 지표
Execution Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Execution Accuracy
Paper Title
Repository
Qwen2(CoT + Code Interpreter)
92.3
-
-
GTS with RoBERTa
41.0
Are NLP Models really able to Solve Simple Math Word Problems?
MMOS-CODE-7B(0-shot)
76.4
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
OpenMath-CodeLlama-70B (w/ code)
87.8
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
GPT-4 (Model Selection)
93.7
Automatic Model Selection with Large Language Models for Reasoning
ATHENA (roberta-large)
54.8
ATHENA: Mathematical Reasoning with Thought Expansion
ATHENA (roberta-base)
45.6
ATHENA: Mathematical Reasoning with Thought Expansion
DeBERTa
63.5
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements
MsAT-DeductReasoner
48.9
Learning Multi-Step Reasoning by Solving Arithmetic Tasks
LSTM Seq2Seq with RoBERTa
40.3
Are NLP Models really able to Solve Simple Math Word Problems?
GPT-4 (Teaching-Inspired)
93.9
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models
-
PaLM (zero-shot)
58.8
Large Language Models are Zero-Shot Reasoners
PaLM (zero-shot, CoT)
62.1
Large Language Models are Zero-Shot Reasoners
GPT-4 DUP
-
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems
GPT-4 (PHP)
91.9
Progressive-Hint Prompting Improves Reasoning in Large Language Models
Roberta-DeductReasoner
47.3
Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction
SYRELM (GPT-J)
40.1
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning
LLaMA 2-Chat
69.2
Llama 2: Open Foundation and Fine-Tuned Chat Models
Graph2Tree with RoBERTa
43.8
Are NLP Models really able to Solve Simple Math Word Problems?
MMOS-DeepSeekMath-7B(0-shot)
79.3
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
0 of 24 row(s) selected.
Previous
Next