HyperAI

OpenMathReasoning Mathematical Reasoning Dataset

Date

3 days ago

Organization

NVIDIA

Publish URL

huggingface.co

Categories

Download Help

The OpenMathReasoning dataset is the world's first large-scale, high-quality dataset focused on mathematical reasoning, released by NVIDIA in 2025. The relevant paper results are:AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset", which aims to help the OpenMath-Nemotron series of models achieve outstanding results in the field of mathematical reasoning.

The dataset contains multi-dimensional fine annotations, including math problem type labels, detailed problem-solving steps, problem difficulty level classification, etc. These high-quality data from the math professional field and online communities provide solid and powerful support for in-depth research on the math reasoning process and optimization of math problem-solving models, and promote the vigorous development of related industries such as intelligent math tutoring systems, math competition auxiliary tools, and scientific research computing automation.

The dataset contains:

  • 540K unique math problems from the AoPS forum,
  • 3.2M Long-Term Strategies of Trust (CoT) Solution
  • 1.7M long Tool Integrated Reasoning (TIR) solution
  • 566K samples to select the most promising solutions from many candidates (GenSelect)