Multi Task Language Understanding On Mgsm
評価指標
Average (%)
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
モデル名 | Average (%) | Paper Title | Repository |
---|---|---|---|
U-PaLM 540B (CoT) | 49.9 | Transcending Scaling Laws with 0.1% Extra Compute | - |
PaLM 540B | 55.0 | PaLM: Scaling Language Modeling with Pathways | |
PaLM 2 (few-shot, k=8, SC) | 87.0 | PaLM 2 Technical Report | |
Flan-U-PaLM 540B (CoT) | 60.4 | Scaling Instruction-Finetuned Language Models | |
Flan-PaLM 540B (8-shot, fine-tuned, CoT + SC) | 72.0 | Scaling Instruction-Finetuned Language Models | |
code-davinci-002 | 35 | Scaling Instruction-Finetuned Language Models | |
Flan-PaLM 540B (8-shot, fine-tuned, CoT) | 57.0 | Scaling Instruction-Finetuned Language Models | |
GPT-3 Davinci 175B | 5.7 | Scaling Instruction-Finetuned Language Models | |
text-davinci-003 | 36 | Scaling Instruction-Finetuned Language Models | |
Flan-PaLM 540B (8-shot, fine-tuned) | 21.2 | Scaling Instruction-Finetuned Language Models | |
text-davinci-002 | 23.7 | Scaling Instruction-Finetuned Language Models | |
PaLM 2 (8-shot, CoT) | 72.2 | PaLM 2 Technical Report |
0 of 12 row(s) selected.