HyperAIHyperAI

Command Palette

Search for a command to run...

Chinese DeepSeek R1 Distill Data 110k Chinese Based on DeepSeek-R1 Distillation Dataset

Date

9 months ago

Size

231.15 MB

License

Apache 2.0

* This dataset supports online use.Click here to jump.

This dataset is a Chinese open source distilled full-blooded R1 dataset. The dataset contains not only math data, but also a large amount of general type data, with a total amount of 110K.

The reason for opening up this dataset is that the effect of R1 is very powerful, and the small model based on the R1 distilled data SFT also shows a strong effect, but a search found that most of the open-source R1 distilled datasets are English datasets. At the same time, the R1 report shows that some general scenario datasets are also used in the distillation model. In order to help everyone better reproduce the effect of the R1 distillation model, the Chinese dataset is open sourced.

The data distribution in this Chinese dataset is as follows:

  • Math: 36,987 samples in total,
  • Exam: 2,440 samples in total,
  • STEM: 12,000 samples in total,
  • General: A total of 58,573, including Retarded Bar, Logical Reasoning, Xiaohongshu, Zhihu, Chat, etc.

Field Description:

  • input: input
  • reasoning_content: Thinking
  • content: output
  • repo_name: data source
Chinese-DeepSeek-R1-Distill-data-110k.torrent
Seeding 1Downloading 0Completed 168Total Downloads 451
  • Chinese-DeepSeek-R1-Distill-data-110k/
    • README.md
      1.74 KB
    • README.txt
      3.48 KB
      • data/
        • Chinese-DeepSeek-R1-Distill-110k.zip
          231.15 MB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp