HyperAI초신경

Math Word Problem Solving On Asdiv A

평가 지표

Execution Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름Execution Accuracy
openmathinstruct-1-a-1-8-million-math84.7
are-nlp-models-really-able-to-solve-simple76.9
an-empirical-study-of-data-ability-boundary85.1
an-empirical-study-of-data-ability-boundary78.6
athena-mathematical-reasoning-with-thought86.4
an-empirical-study-of-data-ability-boundary87.6
are-nlp-models-really-able-to-solve-simple82.2
are-nlp-models-really-able-to-solve-simple81.2
athena-mathematical-reasoning-with-thought91