Mathematical Reasoning On Lila Ood
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Accuracy | Paper Title | Repository |
---|---|---|---|
Bhāskara-P (Fine-tuned, 2.7B) | 0.448 | Lila: A Unified Benchmark for Mathematical Reasoning | |
Bhāskara-A (Fine-tuned, 2.7B) | 0.268 | Lila: A Unified Benchmark for Mathematical Reasoning | |
Codex (Few-Shot, 175B) | 0.586 | Lila: A Unified Benchmark for Mathematical Reasoning | |
GPT-3 (Few-Shot, 175B) | 0.384 | Lila: A Unified Benchmark for Mathematical Reasoning | |
Neo-A (Fine-tuned, 2.7B) | 0.177 | Lila: A Unified Benchmark for Mathematical Reasoning | |
Neo-P (Fine-tuned, 2.7B) | 0.238 | Lila: A Unified Benchmark for Mathematical Reasoning |
0 of 6 row(s) selected.