Mathematical Reasoning On Frontiermath
Metriken
Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
| Paper Title | ||
|---|---|---|
| o3 | 0.252 | - |
| Gemini 1.5 Pro (002) | 0.02 | FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI |
| o1-mini | 0.01 | - |
| o1-preview | 0.01 | - |
| Claude 3.5 Sonnet | 0.01 | - |
| GPT-4o | 0.01 | - |
0 of 6 row(s) selected.