Question Answering On Drop
평가 지표
Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
모델 이름 | Accuracy | Paper Title | Repository |
---|---|---|---|
PaLM 540B (Self Consistency) | 78.2 | Large Language Models Can Self-Improve | - |
PaLM 540B (Self Improvement, Self Consistency) | 83 | Large Language Models Can Self-Improve | - |
PaLM 540B (Self Improvement, Standard-Prompting) | 71.7 | Large Language Models Can Self-Improve | - |
PaLM 540B (Standard-Prompting) | 60 | Large Language Models Can Self-Improve | - |
PaLM 540B (CoT Prompting) | 70.6 | Large Language Models Can Self-Improve | - |
PaLM 540B (Self Improvement, CoT Prompting) | 76.2 | Large Language Models Can Self-Improve | - |
0 of 6 row(s) selected.