Math Word Problem Solving On Asdiv A
평가 지표
Execution Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | Execution Accuracy |
---|---|
openmathinstruct-1-a-1-8-million-math | 84.7 |
are-nlp-models-really-able-to-solve-simple | 76.9 |
an-empirical-study-of-data-ability-boundary | 85.1 |
an-empirical-study-of-data-ability-boundary | 78.6 |
athena-mathematical-reasoning-with-thought | 86.4 |
an-empirical-study-of-data-ability-boundary | 87.6 |
are-nlp-models-really-able-to-solve-simple | 82.2 |
are-nlp-models-really-able-to-solve-simple | 81.2 |
athena-mathematical-reasoning-with-thought | 91 |