HyperAI

Covering Mathematics/code/science/puzzles, High-quality Reasoning Data Sets Are Summarized to Help Reproduce DeepSeek's Powerful Reasoning Capabilities

a month ago
Information
zhaorui
特色图像

Recently, the craze for reasoning models triggered by DeepSeek-R1 is still rising. On January 31, OpenAI launched a new reasoning model o3-mini. On February 18, xAI launched Grok 3, including Grok-3 Reasoning Beta and Grok-3 mini Reasoning with reasoning capabilities. On February 25, Anthropic launched its first hybrid reasoning model Claude 3.7 Sonnet.

Indeed, in the context of increasing homogeneity and fierce competition among large models, reasoning ability has become an important indicator for measuring performance, and is also an important development direction for AI to advance to AGI. As the ceiling of algorithm optimization gradually emerges, and model parameters are gradually being compressed to their limits,The quality of data has become one of the key factors in determining whether the model can shift from simple "answer memory" to deep "logical reasoning".

The construction of an inference data set is far from a simple pile of questions. In order to prevent the model from leaking information during training and cheating during testing, the data must be strictly isolated from the test set and the training set, and a dynamic update mechanism must be introduced to regularly update the question types. When dealing with complex tasks such as mathematical proofs and code generation, it is also necessary to carefully design multiple logical chains when constructing the data set, cleverly set hidden trap conditions, and simulate the trial and error and thinking process of humans in solving problems as much as possible, so as to provide the model with learning materials that are closer to real application scenarios.

DeepSeek's outstanding performance in the AIME mathematics competition is a vivid example. It relies on the OpenThoughts-114k dataset.It covers a series of problems that require step-by-step deduction and involve multiple logical chains.With a strict verification mechanism and a cleverly arranged multi-step reasoning structure, the accuracy and reliability of the data are guaranteed while allowing the model to learn deeper reasoning capabilities instead of relying solely on "memory" to answer questions.

In summary, the success of DeepSeek has led to a surge in the industry’s attention to high-quality reasoning datasets.HyperAI has compiled some of the most popular reasoning data sets for everyone, covering multiple fields such as mathematics, code, science, and puzzles.For practitioners and researchers who want to substantially improve the reasoning capabilities of large models, these datasets are undoubtedly an excellent starting point.

Click to view more open source datasets:

https://go.hyper.ai/CdPJZ

Inference dataset summary

1. OpenThoughts-114k Reasoning Dataset

Estimated size:922.07 MB

Download address:https://go.hyper.ai/SaAit

The dataset was released by Open Thoughts in 2025 and focuses on areas such as mathematics, code, science, and puzzles, and contains 114,000 high-quality samples. It aims to train small reasoning models to surpass existing large models (such as DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-7B) in mathematics and code reasoning tasks.

2. Bespoke-Stratos-17k reasoning task dataset

Estimated size:125 MB

Download address:https://go.hyper.ai/nLGos

This dataset is a high-quality dataset designed for reasoning tasks. It contains questions, reasoning trajectories, and answers, covering multiple fields such as code, mathematics, and scientific puzzles, and is designed to support the training of high-performance reasoning models. The dataset consists of 3 parts:

* Programming data: 5,000 records from APPs and TACO * Mathematical data: 10,000 records from the AIME, MATH, and Olympiads subsets of the NuminaMATH dataset

* Science and puzzle data: 1,000 data points from STILL-2

3. Dolphin-R1 Inference Dataset

Estimated size:2.24 GB

Download address:https://go.hyper.ai/grwUo

The Dolphin-R1 reasoning dataset contains about 800,000 samples, with data sources including DeepSeek-R1, Gemini Flash and 200,000 samples provided by Dolphin Chat, aiming to provide high-quality samples for training reasoning models similar to DeepSeek-R1. These samples are mainly used to improve the performance of the model in reasoning tasks, covering complex tasks such as mathematics, logic, and coding.

4. LIMO Mathematical Reasoning Benchmark Dataset

Estimated size:4.22 MB

Download address:https://go.hyper.ai/0p72o

The LIMO mathematical reasoning benchmark dataset contains only 817 high-quality mathematical reasoning samples. It aims to train and evaluate the mathematical reasoning ability of large models by carefully selecting high-quality training samples. This dataset is mainly used to train the mathematical problem-solving ability of large models and improve their performance in mathematical exams and competition questions (such as AIME, MATH-500, etc.).

5. NuminaMath-1.5 Mathematical Reasoning Dataset

Estimated size:446.62 MB

Download address:https://go.hyper.ai/qVAgO

NuminaMath-1.5 Mathematical Reasoning Dataset is suitable for the fields of mathematics education and competition problems. It contains about 900k high-quality competition-level mathematics problems, and the solution of each problem adopts the chain of thought (CoT) format. These problems are derived from Chinese high school mathematics exercises and American and international mathematics Olympiad competition problems.

6. OpenR1-Math-220k Mathematical Reasoning Dataset

Estimated size:8.44 GB

Download address:https://go.hyper.ai/nuhSv

OpenR1-Math-220k is a large-scale mathematical reasoning dataset released by the Open R1 team in 2025 to fill the gap in DeepSeek R1 synthetic data. The dataset contains 220,000 high-quality mathematical problems and their reasoning traces, which are derived from 800,000 reasoning traces generated by DeepSeek R1.

7. Chinese DeepSeek R1 Distill data

Estimated size:376 MB

Download address:https://go.hyper.ai/8Podu

This dataset is a Chinese open source distilled full-blood R1 dataset. It contains not only Math data, but also a large amount of general type data, with a total of 110K. These include:

* Math: 36,987 samples

* Exam: 2,440 samples

* STEM: 12,000 samples

* General: 58,573 samples, including Retarded Bar, Logical Reasoning, Xiaohongshu, Zhihu, Chat, etc.


The above is the inference data set compiled by HyperAI. If you have resources that you want to include on the hyper.ai official website, you are welcome to leave a message or submit a contribution to tell us!

About HyperAI

HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:

* Provide domestic accelerated download nodes for 1200+ public data sets

* Includes 300+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai