HyperAI초신경

Multi Task Language Understanding On Bbh Nlp

평가 지표

Average (%)

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름Average (%)
모델 186.3
scaling-instruction-finetuned-language-models71.2
orca-2-teaching-small-language-models-how-to45.93
scaling-instruction-finetuned-language-models62.7
scaling-instruction-finetuned-language-models70.0
scaling-instruction-finetuned-language-models78.4
scaling-instruction-finetuned-language-models78.2
orca-2-teaching-small-language-models-how-to50.18
evaluating-large-language-models-trained-on73.5
모델 1082.4
모델 1186.1
scaling-instruction-finetuned-language-models72.4
모델 1385.9
모델 1484.07
모델 1581.0