HyperAIHyperAI

Command Palette

Search for a command to run...

AM-DeepSeek-R1-Distilled-1.4M Large-scale General Reasoning Task Dataset

Date

7 months ago

Size

47.22 GB

Organization

Publish URL

github.com

Paper URL

arxiv.org

AM-DeepSeek-R1-Distilled-1.4M is a large-scale general reasoning task dataset released by am-team in March 2025. The related paper results are "1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training".

The dataset contains about 1.4 million data entries, covering various types of questions such as mathematics, code, scientific Q&A, and general chat. The data has been carefully selected, semantically deduplicated, and strictly cleaned to ensure the high quality and challenge of the data. Each entry in the dataset contains rich thinking traces, which not only provide examples of the reasoning process for the model, but also help the model better understand and generate complex reasoning task solutions. The release of the AM-DeepSeek-R1-Distilled-1.4M dataset aims to provide a powerful tool for the field of natural language processing and reasoning tasks, especially for training and optimizing the reasoning capabilities of large language models. It can help models improve their performance in key areas such as mathematics, code, and scientific Q&A, so as to better cope with various complex reasoning tasks.

AM-DeepSeek-R1-Distilled-1.4M.torrent
Seeding 1Downloading 0Completed 71Total Downloads 159
  • AM-DeepSeek-R1-Distilled-1.4M/
    • README.md
      1.8 KB
    • README.txt
      3.6 KB
      • data/
        • main.zip
          10.32 GB
          • main/
            • README.md
              10.32 GB
            • am_0.5M.jsonl
              23.84 GB
            • am_0.5M.jsonl.zst
              25.76 GB
            • am_0.9M.jsonl
              44.19 GB
            • am_0.9M.jsonl.zst
              47.19 GB
            • am_0.9M_sample_1k.jsonl
              47.21 GB
            • am_0.9M_sample_1k.jsonl.zst
              47.22 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
AM-DeepSeek-R1-Distilled-1.4M Large-scale General Reasoning Task Dataset | Datasets | HyperAI