HyperAI초신경

Math Word Problem Solving On Asdiv A

평가 지표

Execution Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

		Paper Title
ATHENA (roberta-large)	91	ATHENA: Mathematical Reasoning with Thought Expansion
MMOS-DeepSeekMath-7B(0-shot)	87.6	An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
ATHENA (roberta-base)	86.4	ATHENA: Mathematical Reasoning with Thought Expansion
MMOS-CODE-34B(0-shot)	85.1	An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
OpenMath-CodeLlama-70B (w/ code)	84.7	OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Graph2Tree with RoBERTa	82.2	Are NLP Models really able to Solve Simple Math Word Problems?
GTS with RoBERTa	81.2	Are NLP Models really able to Solve Simple Math Word Problems?
MMOS-CODE-7B(0-shot)	78.6	An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
LSTM Seq2Seq with RoBERTa	76.9	Are NLP Models really able to Solve Simple Math Word Problems?

0 of 9 row(s) selected.

Math Word Problem Solving On Asdiv A | SOTA | HyperAI초신경