OpenMathReasoning Mathematical Reasoning Dataset
Date
Publish URL
Categories
The OpenMathReasoning dataset is the world's first large-scale, high-quality dataset focused on mathematical reasoning, released by NVIDIA in 2025. The relevant paper results are:AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset", which aims to help the OpenMath-Nemotron series of models achieve outstanding results in the field of mathematical reasoning.
The dataset contains multi-dimensional fine annotations, including math problem type labels, detailed problem-solving steps, problem difficulty level classification, etc. These high-quality data from the math professional field and online communities provide solid and powerful support for in-depth research on the math reasoning process and optimization of math problem-solving models, and promote the vigorous development of related industries such as intelligent math tutoring systems, math competition auxiliary tools, and scientific research computing automation.
The dataset contains:
- 540K unique math problems from the AoPS forum,
- 3.2M Long-Term Strategies of Trust (CoT) Solution
- 1.7M long Tool Integrated Reasoning (TIR) solution
- 566K samples to select the most promising solutions from many candidates (GenSelect)