Mathematical Reasoning
Benchmark List
All benchmarks related to this task
aime24
Best model: DeepSeek-r1
Metrics
View Details
lila-ood
Best model: Codex (Few-Shot, 175B)
Metrics
View Details
math500
Best model: Search-o1
Metrics
View Details
unigeo
Best model: GOLD
Metrics
View Details
amc23
Metrics
View Details
frontiermath
Metrics
View Details
geoqa
Metrics
View Details
lila-iid
Metrics
View Details
mmlu-mathematics
Metrics
View Details
pgps9k
Metrics
View Details
unigeo-prv
Metrics
View Details