Automated Theorem Proving On Minif2F Test

평가 지표

ITP

Pass@32

cumulative

평가 결과

이 벤치마크에서 각 모델의 성능 결과

				Paper Title
Subgoal-XL	Isabelle	39.3	56.1	SubgoalXL: Subgoal-based Expert Learning for Theorem Proving
Evariste	Lean	-	41	HyperTree Proof Search for Neural Theorem Proving
Lean GPT-f	Lean	29.2	29.2	MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics
Lean Expert Iteration	Lean	34.5	36.6	Formal Mathematics Statement Curriculum Learning
GPT-f	Metamath	-	36.6	HyperTree Proof Search for Neural Theorem Proving
Evariste-1d	Lean	-	38.9	HyperTree Proof Search for Neural Theorem Proving
LLEMMA-7b	Lean	26.2	26.2	Llemma: An Open Language Model For Mathematics
Decomposing the Enigma	Isabelle	-	45.5	Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving
COPRA + GPT-3.5	Lean	-	11.9	An In-Context Learning Agent for Formal Theorem-Proving
MMOS-DeepSeekMath-7B	Lean	-	28.3	An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
DeepSeek-Prover-V1.5	Lean	50.0	63.5	DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Lean tidy	Lean	-	18	MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics
Metamath GPT-f	Metamath	-	1.6	MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics
COPRA + GPT-4	Lean	-	23.3	An In-Context Learning Agent for Formal Theorem-Proving
Thor + expert iteration on autoformalised theorems	Isabelle	-	35.2	-
LLEMMA-34b	Lean	25.8	25.8	Llemma: An Open Language Model For Mathematics
ReProver	Lean	26.5	26.5	-
DeepSeek-Prover	Lean	-	52.0	DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Sledgehammer	Isabelle	-	10.4	Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers
DSP (540B Minerva informal)	Isabelle	-	38.9	Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs

0 of 28 row(s) selected.

Command Palette

Automated Theorem Proving On Minif2F Test

평가 지표

평가 결과