Multi Task Language Understanding On Mgsm
評価指標
Average (%)
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
比較表
モデル名 | Average (%) |
---|---|
transcending-scaling-laws-with-0-1-extra | 49.9 |
palm-scaling-language-modeling-with-pathways-1 | 55.0 |
palm-2-technical-report-1 | 87.0 |
scaling-instruction-finetuned-language-models | 60.4 |
scaling-instruction-finetuned-language-models | 72.0 |
scaling-instruction-finetuned-language-models | 35 |
scaling-instruction-finetuned-language-models | 57.0 |
scaling-instruction-finetuned-language-models | 5.7 |
scaling-instruction-finetuned-language-models | 36 |
scaling-instruction-finetuned-language-models | 21.2 |
scaling-instruction-finetuned-language-models | 23.7 |
palm-2-technical-report-1 | 72.2 |