HyperAI

* This dataset supports online use.Click here to jump.

This dataset is a Chinese open source distilled full-blooded R1 dataset. The dataset contains not only math data, but also a large amount of general type data, with a total amount of 110K.

The reason for opening up this dataset is that the effect of R1 is very powerful, and the small model based on the R1 distilled data SFT also shows a strong effect, but a search found that most of the open-source R1 distilled datasets are English datasets. At the same time, the R1 report shows that some general scenario datasets are also used in the distillation model. In order to help everyone better reproduce the effect of the R1 distillation model, the Chinese dataset is open sourced.

The data distribution in this Chinese dataset is as follows:

Math: 36,987 samples in total,
Exam: 2,440 samples in total,
STEM: 12,000 samples in total,
General: A total of 58,573, including Retarded Bar, Logical Reasoning, Xiaohongshu, Zhihu, Chat, etc.

Field Description:

input: input
reasoning_content: Thinking
content: output
repo_name: data source

Chinese DeepSeek R1 Distill Data 110k Chinese Based on DeepSeek-R1 Distillation Dataset

* This dataset supports online use.Click here to jump.

Build AI with AI

Hyper Newsletters

Command Palette

Chinese DeepSeek R1 Distill Data 110k Chinese Based on DeepSeek-R1 Distillation Dataset

* This dataset supports online use.Click here to jump.

Build AI with AI

Hyper Newsletters