Multi Task Language Understanding On Bbh Alg
評価指標
Average (%)
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
比較表
モデル名 | Average (%) |
---|---|
scaling-instruction-finetuned-language-models | 61.3 |
evaluating-large-language-models-trained-on | 73.9 |
scaling-instruction-finetuned-language-models | 57.6 |
scaling-instruction-finetuned-language-models | 66.5 |
scaling-instruction-finetuned-language-models | 38.3 |
scaling-instruction-finetuned-language-models | 48.2 |
scaling-instruction-finetuned-language-models | 62.2 |