Math Word Problem Solving On Gsm Plus

1:1 Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

		Paper Title
GPT-4	85.6	GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers

0 of 1 row(s) selected.