HyperAI초신경

Automated Theorem Proving On Minif2F Test

평가 지표

ITP
Pass@32
cumulative

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름ITPPass@32cumulative
subgoalxl-subgoal-based-expert-learning-forIsabelle39.356.1
hypertree-proof-search-for-neural-theoremLean-41
minif2f-a-cross-system-benchmark-for-formalLean29.229.2
formal-mathematics-statement-curriculumLean34.536.6
hypertree-proof-search-for-neural-theoremMetamath-36.6
hypertree-proof-search-for-neural-theoremLean-38.9
llemma-an-open-language-model-for-mathematicsLean26.226.2
decomposing-the-enigma-subgoal-basedIsabelle-45.5
a-language-agent-approach-to-formal-theoremLean-11.9
an-empirical-study-of-data-ability-boundaryLean-28.3
deepseek-prover-v1-5-harnessing-proofLean50.063.5
minif2f-a-cross-system-benchmark-for-formalLean-18
minif2f-a-cross-system-benchmark-for-formalMetamath-1.6
a-language-agent-approach-to-formal-theoremLean-23.3
모델 15Isabelle-35.2
llemma-an-open-language-model-for-mathematicsLean25.825.8
모델 17Lean26.526.5
deepseek-prover-advancing-theorem-proving-inLean-52.0
thor-wielding-hammers-to-integrate-languageIsabelle-10.4
draft-sketch-and-prove-guiding-formal-theoremIsabelle-38.9
lego-prover-neural-theorem-proving-withIsabelle-47.1
a-language-agent-approach-to-formal-theoremLean-30.7
proof-artifact-co-training-for-theoremIsabelle-24.6
hypertree-proof-search-for-neural-theoremLean-40.6
efficient-neural-theorem-proving-via-fineIsabelle-66.0
draft-sketch-and-prove-guiding-formal-theoremIsabelle-20.9
thor-wielding-hammers-to-integrate-languageIsabelle-29.9
lyra-orchestrating-dual-correction-inIsabelle-47.1