Logical Reasoning On Big Bench Temporal
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | Accuracy |
---|---|
bloomberggpt-a-large-language-model-for | 29.2 |
scaling-language-models-methods-analysis-1 | 19.0 |
bloomberggpt-a-large-language-model-for | 23.6 |
palm-2-technical-report-1 | 100 |
bloomberggpt-a-large-language-model-for | 39.6 |
bloomberggpt-a-large-language-model-for | 36.8 |
training-compute-optimal-large-language | 32.0 |
bloomberggpt-a-large-language-model-for | 21.2 |
palm-2-technical-report-1 | 96.4 |