Command Palette
Search for a command to run...
Mathematical Reasoning On Lila Ood
評価指標
Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
| Paper Title | ||
|---|---|---|
| Codex (Few-Shot, 175B) | 0.586 | Lila: A Unified Benchmark for Mathematical Reasoning |
| Bhāskara-P (Fine-tuned, 2.7B) | 0.448 | Lila: A Unified Benchmark for Mathematical Reasoning |
| GPT-3 (Few-Shot, 175B) | 0.384 | Lila: A Unified Benchmark for Mathematical Reasoning |
| Bhāskara-A (Fine-tuned, 2.7B) | 0.268 | Lila: A Unified Benchmark for Mathematical Reasoning |
| Neo-P (Fine-tuned, 2.7B) | 0.238 | Lila: A Unified Benchmark for Mathematical Reasoning |
| Neo-A (Fine-tuned, 2.7B) | 0.177 | Lila: A Unified Benchmark for Mathematical Reasoning |
0 of 6 row(s) selected.