HyperAI초신경

Logical Reasoning On Lingoly

평가 지표

Delta_NoContext
Exact Match Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름Delta_NoContextExact Match Accuracy
lingoly-a-benchmark-of-olympiad-level23.4%32.1%
lingoly-a-benchmark-of-olympiad-level21.5%33.4%
lingoly-a-benchmark-of-olympiad-level11.2%21.2%
lingoly-a-benchmark-of-olympiad-level28.8%46.3%
lingoly-a-benchmark-of-olympiad-level11.6%21.5%
lingoly-a-benchmark-of-olympiad-level4.9%11.4%
lingoly-a-benchmark-of-olympiad-level2.9%10.3%
lingoly-a-benchmark-of-olympiad-level1.1%6.4%
lingoly-a-benchmark-of-olympiad-level25.1%37.6%
lingoly-a-benchmark-of-olympiad-level6.4%14.2%
lingoly-a-benchmark-of-olympiad-level2.2%4.9%