Automated Theorem Proving On Minif2F Valid
평가 지표
Pass@64
평가 결과
이 벤치마크에서 각 모델의 성능 결과
모델 이름 | Pass@64 | Paper Title | Repository |
---|---|---|---|
GPT-f | 47.3 | HyperTree Proof Search for Neural Theorem Proving | - |
Evariste-1d | 46.7 | HyperTree Proof Search for Neural Theorem Proving | - |
Metamath GPT-f | - | MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics | |
Lean GPT-f | - | MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics | |
Evariste-7d | 47.5 | HyperTree Proof Search for Neural Theorem Proving | - |
Evariste | 58.6 | HyperTree Proof Search for Neural Theorem Proving | - |
Lean tidy | - | MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics | |
DSP (62B Minerva informal) | - | Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs | - |
Lyra + GPT-4 | - | Lyra: Orchestrating Dual Correction in Automated Theorem Proving | |
LEGO-Prover ChatGPT | - | LEGO-Prover: Neural Theorem Proving with Growing Libraries |
0 of 10 row(s) selected.