Math Word Problem Solving On Asdiv A
평가 지표
Execution Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
모델 이름 | Execution Accuracy | Paper Title | Repository |
---|---|---|---|
OpenMath-CodeLlama-70B (w/ code) | 84.7 | OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset | |
LSTM Seq2Seq with RoBERTa | 76.9 | Are NLP Models really able to Solve Simple Math Word Problems? | |
MMOS-CODE-34B(0-shot) | 85.1 | An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning | |
MMOS-CODE-7B(0-shot) | 78.6 | An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning | |
ATHENA (roberta-base) | 86.4 | ATHENA: Mathematical Reasoning with Thought Expansion | |
MMOS-DeepSeekMath-7B(0-shot) | 87.6 | An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning | |
Graph2Tree with RoBERTa | 82.2 | Are NLP Models really able to Solve Simple Math Word Problems? | |
GTS with RoBERTa | 81.2 | Are NLP Models really able to Solve Simple Math Word Problems? | |
ATHENA (roberta-large) | 91 | ATHENA: Mathematical Reasoning with Thought Expansion |
0 of 9 row(s) selected.