Question Answering On Drop
評価指標
Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
モデル名 | Accuracy | Paper Title | Repository |
---|---|---|---|
PaLM 540B (Self Consistency) | 78.2 | Large Language Models Can Self-Improve | - |
PaLM 540B (Self Improvement, Self Consistency) | 83 | Large Language Models Can Self-Improve | - |
PaLM 540B (Self Improvement, Standard-Prompting) | 71.7 | Large Language Models Can Self-Improve | - |
PaLM 540B (Standard-Prompting) | 60 | Large Language Models Can Self-Improve | - |
PaLM 540B (CoT Prompting) | 70.6 | Large Language Models Can Self-Improve | - |
PaLM 540B (Self Improvement, CoT Prompting) | 76.2 | Large Language Models Can Self-Improve | - |
0 of 6 row(s) selected.