OpenThoughts2-1M Reasoning Dataset
OpenThoughts2-1M is an open source reasoning dataset released by Open Thoughts in 2025. The related paper results are:OpenThoughts: Data Recipes for Reasoning Models".
The dataset is based on the OpenThoughts-114k dataset, adding existing datasets such as OpenR1 and other math and code reasoning data. The data contains 1 million high-quality examples covering math, science, code, and puzzles. The performance of the OpenThinker2 model trained on this dataset is comparable to the DeepSeek-R1-Distill model.

Data Structure
open-thoughts2M.torrent
Seeding 1Downloading 0Completed 4Total Downloads 7