Multi Task Language Understanding On Bbh Alg
평가 지표
Average (%)
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | Average (%) |
---|---|
scaling-instruction-finetuned-language-models | 61.3 |
evaluating-large-language-models-trained-on | 73.9 |
scaling-instruction-finetuned-language-models | 57.6 |
scaling-instruction-finetuned-language-models | 66.5 |
scaling-instruction-finetuned-language-models | 38.3 |
scaling-instruction-finetuned-language-models | 48.2 |
scaling-instruction-finetuned-language-models | 62.2 |