OpenMathReasoning Mathematical Reasoning Dataset
Date
Size
Publish URL
Categories
The OpenMathReasoning dataset is the world's first large-scale, high-quality dataset focused on mathematical reasoning, released by NVIDIA in 2025. The relevant paper results are:AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset", which aims to help the OpenMath-Nemotron series of models achieve outstanding results in the field of mathematical reasoning.
The dataset contains multi-dimensional fine annotations, including math problem type labels, detailed problem-solving steps, problem difficulty level classification, etc. These high-quality data from the math professional field and online communities provide solid and powerful support for in-depth research on the math reasoning process and optimization of math problem-solving models, and promote the vigorous development of related industries such as intelligent math tutoring systems, math competition auxiliary tools, and scientific research computing automation.
The dataset contains 540K unique math problems from the AoPS forum, including:
- 3.2M Long-Term Strategies of Trust (CoT) Solution
- 1.7M long Tool Integrated Reasoning (TIR) solution
- 566K samples to select the most promising solutions from many candidates (GenSelect)