HyperAI초신경

Mathematical Reasoning On Lila Ood

평가 지표

Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름
Accuracy
Paper TitleRepository
Bhāskara-P (Fine-tuned, 2.7B)0.448Lila: A Unified Benchmark for Mathematical Reasoning
Bhāskara-A (Fine-tuned, 2.7B)0.268Lila: A Unified Benchmark for Mathematical Reasoning
Codex (Few-Shot, 175B)0.586Lila: A Unified Benchmark for Mathematical Reasoning
GPT-3 (Few-Shot, 175B)0.384Lila: A Unified Benchmark for Mathematical Reasoning
Neo-A (Fine-tuned, 2.7B)0.177Lila: A Unified Benchmark for Mathematical Reasoning
Neo-P (Fine-tuned, 2.7B)0.238Lila: A Unified Benchmark for Mathematical Reasoning
0 of 6 row(s) selected.