HyperAI

AM-DeepSeek-R1-Distilled-1.4M Large-scale General Reasoning Task Dataset

Date

2 months ago

Size

47.22 GB

Organization

Publish URL

github.com

AM-DeepSeek-R1-Distilled-1.4M is a large-scale general reasoning task dataset released by am-team in March 2025. The related paper results are "1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training".

The dataset contains about 1.4 million data entries, covering various types of questions such as mathematics, code, scientific Q&A, and general chat. The data has been carefully selected, semantically deduplicated, and strictly cleaned to ensure the high quality and challenge of the data. Each entry in the dataset contains rich thinking traces, which not only provide examples of the reasoning process for the model, but also help the model better understand and generate complex reasoning task solutions. The release of the AM-DeepSeek-R1-Distilled-1.4M dataset aims to provide a powerful tool for the field of natural language processing and reasoning tasks, especially for training and optimizing the reasoning capabilities of large language models. It can help models improve their performance in key areas such as mathematics, code, and scientific Q&A, so as to better cope with various complex reasoning tasks.

AM-DeepSeek-R1-Distilled-1.4M.torrent
Seeding 1Downloading 0Completed 0Total Downloads 2
  • AM-DeepSeek-R1-Distilled-1.4M/
    • README.md
      1.8 KB
    • README.txt
      3.6 KB
      • data/
        • main.zip
          10.32 GB
          • main/
            • README.md
              10.32 GB
            • am_0.5M.jsonl
              23.84 GB
            • am_0.5M.jsonl.zst
              25.76 GB
            • am_0.9M.jsonl
              44.19 GB
            • am_0.9M.jsonl.zst
              47.19 GB
            • am_0.9M_sample_1k.jsonl
              47.21 GB
            • am_0.9M_sample_1k.jsonl.zst
              47.22 GB