HyperAI초신경

Multi Task Language Understanding On Bbh Alg

평가 지표

Average (%)

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름Average (%)
scaling-instruction-finetuned-language-models61.3
evaluating-large-language-models-trained-on73.9
scaling-instruction-finetuned-language-models57.6
scaling-instruction-finetuned-language-models66.5
scaling-instruction-finetuned-language-models38.3
scaling-instruction-finetuned-language-models48.2
scaling-instruction-finetuned-language-models62.2