HyperAI

NuminaMath-CoT Mathematics Competition Problem Dataset

Date

9 months ago

Size

1.01 GB

Organization

AI-MO

Publish URL

huggingface.co

License

CC BY-NC-SA 3.0

* This dataset supports online use.Click here to jump.

This dataset was proposed by AI-MO in 2024 and contains 860k+ math competition question-solution pairs, each of which uses the Chain of Thought (CoT) reasoning template. The sources of the dataset include Chinese high school math exercises, American and International Mathematical Olympiad competition questions. The data is mainly collected from online test paper PDFs and math discussion forums. The processing steps include (a) OCR from the original PDF, (b) segmentation into problem-solution pairs, (c) translation into English, (d) reshaping to generate CoT reasoning format, and (e) final answer format.

NuminaMath-CoT.torrent
Seeding 1Downloading 1Completed 74Total Downloads 200
  • NuminaMath-CoT/
    • README.md
      1.25 KB
    • README.txt
      2.5 KB
      • data/
        • NuminaMath-CoT.zip
          1.01 GB