DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

We introduce DeepSeek-Prover-V1.5, an open-source language model designed fortheorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing bothtraining and inference processes. Pre-trained on DeepSeekMath-Base withspecialization in formal mathematical languages, the model undergoes supervisedfine-tuning using an enhanced formal theorem proving dataset derived fromDeepSeek-Prover-V1. Further refinement is achieved through reinforcementlearning from proof assistant feedback (RLPAF). Beyond the single-passwhole-proof generation approach of DeepSeek-Prover-V1, we propose RMaxTS, avariant of Monte-Carlo tree search that employs an intrinsic-reward-drivenexploration strategy to generate diverse proof paths. DeepSeek-Prover-V1.5demonstrates significant improvements over DeepSeek-Prover-V1, achieving newstate-of-the-art results on the test set of the high school level miniF2Fbenchmark (63.5%) and the undergraduate level ProofNet benchmark (25.3%).