Multi Task Language Understanding On Bbh Alg
评估指标
Average (%)
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | Average (%) |
---|---|
scaling-instruction-finetuned-language-models | 61.3 |
evaluating-large-language-models-trained-on | 73.9 |
scaling-instruction-finetuned-language-models | 57.6 |
scaling-instruction-finetuned-language-models | 66.5 |
scaling-instruction-finetuned-language-models | 38.3 |
scaling-instruction-finetuned-language-models | 48.2 |
scaling-instruction-finetuned-language-models | 62.2 |