Automated Theorem Proving On Minif2F Test
Metrics
ITP
Pass@32
cumulative
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | ITP | Pass@32 | cumulative |
---|---|---|---|
subgoalxl-subgoal-based-expert-learning-for | Isabelle | 39.3 | 56.1 |
hypertree-proof-search-for-neural-theorem | Lean | - | 41 |
minif2f-a-cross-system-benchmark-for-formal | Lean | 29.2 | 29.2 |
formal-mathematics-statement-curriculum | Lean | 34.5 | 36.6 |
hypertree-proof-search-for-neural-theorem | Metamath | - | 36.6 |
hypertree-proof-search-for-neural-theorem | Lean | - | 38.9 |
llemma-an-open-language-model-for-mathematics | Lean | 26.2 | 26.2 |
decomposing-the-enigma-subgoal-based | Isabelle | - | 45.5 |
a-language-agent-approach-to-formal-theorem | Lean | - | 11.9 |
an-empirical-study-of-data-ability-boundary | Lean | - | 28.3 |
deepseek-prover-v1-5-harnessing-proof | Lean | 50.0 | 63.5 |
minif2f-a-cross-system-benchmark-for-formal | Lean | - | 18 |
minif2f-a-cross-system-benchmark-for-formal | Metamath | - | 1.6 |
a-language-agent-approach-to-formal-theorem | Lean | - | 23.3 |
Model 15 | Isabelle | - | 35.2 |
llemma-an-open-language-model-for-mathematics | Lean | 25.8 | 25.8 |
Model 17 | Lean | 26.5 | 26.5 |
deepseek-prover-advancing-theorem-proving-in | Lean | - | 52.0 |
thor-wielding-hammers-to-integrate-language | Isabelle | - | 10.4 |
draft-sketch-and-prove-guiding-formal-theorem | Isabelle | - | 38.9 |
lego-prover-neural-theorem-proving-with | Isabelle | - | 47.1 |
a-language-agent-approach-to-formal-theorem | Lean | - | 30.7 |
proof-artifact-co-training-for-theorem | Isabelle | - | 24.6 |
hypertree-proof-search-for-neural-theorem | Lean | - | 40.6 |
efficient-neural-theorem-proving-via-fine | Isabelle | - | 66.0 |
draft-sketch-and-prove-guiding-formal-theorem | Isabelle | - | 20.9 |
thor-wielding-hammers-to-integrate-language | Isabelle | - | 29.9 |
lyra-orchestrating-dual-correction-in | Isabelle | - | 47.1 |