NuminaMath-LEAN Math Problem Dataset
Date
Publish URL
License
Apache 2.0
NuminaMath-LEAN is a mathematical problem dataset jointly released by Numina and Kimi Team in 2025. The related paper results are "Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning", which aims to provide manually annotated formal statements and proofs for the training and evaluation of automated theorem proving models.
This dataset contains 100,000 math competition problems, including those from authoritative competitions such as the International Mathematical Olympiad (IMO) and the United States Mathematical Olympiad (USAMO). The data types include problem statements, question type classifications, answers, sources, formal proofs, annotator information, and reinforcement learning training process records.