HyperAI

OpenThoughts2-1M Reasoning Dataset

Date

13 days ago

Size

6.91 GB

Publish URL

huggingface.co

Categories

OpenThoughts2-1M is an open source reasoning dataset released by Open Thoughts in 2025. The related paper results are:OpenThoughts: Data Recipes for Reasoning Models".

The dataset is based on the OpenThoughts-114k dataset, adding existing datasets such as OpenR1 and other math and code reasoning data. The data contains 1 million high-quality examples covering math, science, code, and puzzles. The performance of the OpenThinker2 model trained on this dataset is comparable to the DeepSeek-R1-Distill model.

Data Structure

open-thoughts2M.torrent
Seeding 1Downloading 0Completed 4Total Downloads 7
  • open-thoughts2M/
    • README.md
      1.27 KB
    • README.txt
      2.54 KB
      • data/
        • open-thoughts2M.zip
          6.91 GB