Math Word Problem Solving On Asdiv A
評価指標
Execution Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
モデル名 | Execution Accuracy | Paper Title | Repository |
---|---|---|---|
OpenMath-CodeLlama-70B (w/ code) | 84.7 | OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset | |
LSTM Seq2Seq with RoBERTa | 76.9 | Are NLP Models really able to Solve Simple Math Word Problems? | |
MMOS-CODE-34B(0-shot) | 85.1 | An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning | |
MMOS-CODE-7B(0-shot) | 78.6 | An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning | |
ATHENA (roberta-base) | 86.4 | ATHENA: Mathematical Reasoning with Thought Expansion | |
MMOS-DeepSeekMath-7B(0-shot) | 87.6 | An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning | |
Graph2Tree with RoBERTa | 82.2 | Are NLP Models really able to Solve Simple Math Word Problems? | |
GTS with RoBERTa | 81.2 | Are NLP Models really able to Solve Simple Math Word Problems? | |
ATHENA (roberta-large) | 91 | ATHENA: Mathematical Reasoning with Thought Expansion |
0 of 9 row(s) selected.