HyperAIHyperAI

Command Palette

Search for a command to run...

A Collection of 7 Major Mathematical Reasoning Datasets, Covering Arithmetic reasoning/symbolic logic/visual mathematics/geometric Analysis

Featured Image

With the rapid advancement of large-scale model capabilities, mathematical reasoning is evolving from a uniquely human intellectual activity to one of the most challenging frontiers in artificial intelligence. Tasks that once relied on human rationality, such as logical deduction, formula calculation, and multi-step thinking, are now being gradually "understood" and "learned" by machines.However, unlike language comprehension or image recognition, mathematical reasoning requires the model to not only understand the surface meaning of the question, but also to have insight into the logical structure behind it, which makes the model's performance particularly dependent on data quality.

The advancement of models from "calculation" to "reasoning" requires the support of high-quality, structured, and logically coherent data. A systematic, hierarchical, and logically consistent dataset not only determines whether a model can grasp the reasoning principles behind abstract symbols but also influences its ability to generalize and self-correct in open environments.Compared with general natural language corpora, mathematical reasoning datasets place more emphasis on the diversity of problem distribution, the explainability of problem-solving paths, and the complete annotation of reasoning chains, ensuring that the model's learning process is as close to human thinking as possible.

Overall,Mathematical reasoning is becoming a key window for artificial intelligence to move towards "explainable intelligence."To promote research and application in this area, HyperAI has compiled a series of mathematical reasoning datasets from leading institutions and companies around the world, including Zhejiang University, the University of Hong Kong, NVIDIA, OpenAI, and Alibaba, covering multiple areas including visual mathematics and geometric analysis.

Click to view more open source datasets:

https://go.hyper.ai/CdPJZ

Mathematical Reasoning Dataset Summary

1. We-Math2.0-Standard Benchmark Dataset

Estimated size:369.86 MB

Download address:https://go.hyper.ai/1dAZ2

We-Math2.0-Standard is a standard dataset for visual mathematical reasoning released in 2025 by Beijing University of Posts and Telecommunications, Tencent, and Tsinghua University. The related paper is titled "WE-MATH 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning." It aims to provide a diagnosable, explainable, and comparable evaluation basis.

Paper address:

https://hyper.ai/en/papers/2508.10433

This dataset builds a unified label space around 1,819 precisely defined knowledge principles, explicitly annotating each question with the principle and rigorously curating it, thereby achieving broad and balanced coverage overall, particularly strengthening mathematical subfields and question types that were previously underrepresented. The dataset adopts a dual expansion design:

* First, multiple images per question are used to test the integration and alignment of multi-source visual evidence;

* Second, multi-questions per image is used to test multi-principle transfer and conceptual flexibility in the same visual context.

Each example consists of an image and a text stem, and is accompanied by annotations of the knowledge principles and standard answers that the question relies on.

2. NuminaMath-LEAN math problem dataset

Estimated size:65.06 MB

Download address:https://go.hyper.ai/BfJFv

NuminaMath-LEAN is a mathematical problem dataset jointly released by Numina and the Kimi Team in 2025. The related paper is "Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning". It aims to provide manually annotated formal statements and proofs for the training and evaluation of automated theorem proving models.

Paper address:

https://hyper.ai/en/papers/2504.11354

This dataset contains 100,000 math competition problems, including those from authoritative competitions such as the International Mathematical Olympiad (IMO) and the United States Mathematical Olympiad (USAMO). The data types include problem statements, question type classifications, answers, sources, formal proofs, annotator information, and reinforcement learning training process records.

3. T-Wix Russian SFT dataset

Estimated size:1.43 GB

Download address:https://go.hyper.ai/5XULu

T-Wix is a Russian SFT dataset. The related paper is "From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning", which aims to enhance the model's ability from solving algorithmic and mathematical problems to dialogue, logical thinking and reasoning patterns.

Paper address:

https://arxiv.org/abs/2308.12032

The dataset contains 499,598 Russian language samples, including 468,614 general samples covering a variety of areas, including mathematics, science, programming, general knowledge, instruction following, role-playing, etc. The reasoning samples contain 30,984 data points, focusing on advanced mathematics and science problems and providing detailed reasoning traces.

4. Nemotron-Math-HumanReasoning Mathematical Reasoning Dataset

Estimated size:639.91 KB

Download address:https://go.hyper.ai/28kjP

Nemotron-Math-HumanReasoning is a mathematical reasoning dataset released by NVIDIA in 2025. The related paper result is "The Challenge of Teaching Reasoning to LLMs Without RL or Distillation", which aims to simulate the extended reasoning style of models such as DeepSeek-R1.

Paper address:

https://arxiv.org/abs/2507.09850

The dataset contains 50 math problems from the OpenMathReasoning dataset, 200 human-written solutions, and an additional 50 solutions generated by QwQ-32B-Preview.

5. Open-Omega-Atom-1.5M Dataset

Estimated size:6.6 GB

Download address:https://go.hyper.ai/bndWW

Open-Omega-Atom-1.5M is a mathematics and science reasoning dataset designed to enhance reasoning capabilities in mathematics and science.

The dataset contains about 1.5 million data and is designed for mathematics, science, and code applications, with mathematical data accounting for an important part of its composition.

Dataset features:

* Concise and high-quality: Focus on clear, challenging problems and step-by-step solutions.

* STEM Focus: Integrates math, code reasoning, and scientific thinking with a math major.

* Curated and optimized: Data is selectively sourced from high-quality open datasets and custom data to achieve optimal diversity and coherence.

* Suitable for reasoning: has strong coverage of step-based and logic-based problem solving, and can serve as a benchmark for reasoning engines.

6. GSM8K Mathematical Reasoning Dataset

Estimated size:4.92 MB

Download address:https://go.hyper.ai/d9PZh

GSM8K is a mathematical reasoning dataset released by OpenAI in 2022. The related paper results are: "Training Verifiers to Solve Math Word Problems", which aims to improve the performance of machine learning models in understanding and solving complex mathematical problems.

Paper address:

https://arxiv.org/abs/2110.14168

This dataset contains 8.5k high-quality elementary school math word problems in diverse languages, covering algebra, arithmetic, geometry and other fields. The problem solving steps are between 2-8 steps. Its solution mainly involves a series of simple calculations using basic arithmetic operations (+ − × ÷) to get the final answer.

7. VCBench Mathematical Reasoning Benchmark Dataset

Estimated size:86.04 MB

Download address:https://hyper.ai/cn/datasets/43960

VCBench is a benchmark dataset for evaluating multimodal mathematical reasoning with explicit visual dependencies, released by Alibaba and Zhejiang University in 2025. The dataset contains 1,720 question-answer pairs and a total of 6,697 images.

The questions mainly include the following 6 areas:

* Time and Calendar: Tests temporal reasoning questions across two subcategories (Calendar and Clock), requiring an understanding of time intervals and calendar-based calculations.

* Space and Position: Challenges focus on spatial reasoning across three subcategories (direction, position, and place) to assess understanding of relative position, direction, and spatial relationships.

* Geometry and Shapes: Questions covering five subcategories (angles, quadrilaterals, rectangles, shapes, and triangles) test basic geometric understanding, from basic shape recognition to more complex property analysis.

* Objects and Motion: Tasks in two subcategories (Cube and Move) assess understanding of three-dimensional objects and motion transformations.

* Reasoning and Observation: Questions in both subcategories (Inference and Observation) are designed to test logical reasoning and careful visual observation skills.

* Organization and Patterns: Challenges across three subcategories (Organization, Patterns, and Weighting) assess pattern recognition, sequencing, and organizational logic.

The above is a summary of the recommended datasets in this issue. Come and download them with one click~