AM-DeepSeek-R1-Distilled-1.4M Large-scale General Reasoning Task Dataset
Date
Size
Publish URL
Categories
AM-DeepSeek-R1-Distilled-1.4M is a large-scale general reasoning task dataset released by am-team in March 2025. The related paper results are "1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training".
The dataset contains about 1.4 million data entries, covering various types of questions such as mathematics, code, scientific Q&A, and general chat. The data has been carefully selected, semantically deduplicated, and strictly cleaned to ensure the high quality and challenge of the data. Each entry in the dataset contains rich thinking traces, which not only provide examples of the reasoning process for the model, but also help the model better understand and generate complex reasoning task solutions. The release of the AM-DeepSeek-R1-Distilled-1.4M dataset aims to provide a powerful tool for the field of natural language processing and reasoning tasks, especially for training and optimizing the reasoning capabilities of large language models. It can help models improve their performance in key areas such as mathematics, code, and scientific Q&A, so as to better cope with various complex reasoning tasks.