Mathematical Reasoning On Lila Iid
평가 지표
Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
모델 이름 | Accuracy | Paper Title | Repository |
---|---|---|---|
Bhāskara-A (Fine-tuned, 2.7B) | 0.252 | Lila: A Unified Benchmark for Mathematical Reasoning | |
Neo-P (Fine-tuned, 2.7B) | 0.394 | Lila: A Unified Benchmark for Mathematical Reasoning | |
Bhāskara-P (Fine-tuned, 2.7B) | 0.48 | Lila: A Unified Benchmark for Mathematical Reasoning | |
GPT-3 (Few-Shot, 175B) | 0.384 | Lila: A Unified Benchmark for Mathematical Reasoning | |
Neo-A (Fine-tuned, 2.7B) | 0.204 | Lila: A Unified Benchmark for Mathematical Reasoning | |
Codex (Few-Shot, 175B) | 0.604 | Lila: A Unified Benchmark for Mathematical Reasoning |
0 of 6 row(s) selected.