HyperAI

QwQ-LongCoT-130K Fine-tuning Dataset

The QwQ-LongCoT-130K dataset is a SFT (Supervised Fine-Tuning) dataset designed for training large language models (LLMs) like O1. The dataset is characterized by its focus on chain-of-thought reasoning, which means that it not only pursues the generation of long text responses, but also focuses on the generated responses that can show in-depth thinking processes and logical reasoning. This dataset contains about 130,000 instances, each of which is a response generated using the QwQ-32B-Preview model.

The QwQ-LongCoT-130K dataset consists of about 90,000 samples from NuminaMath and about 43,000 samples generated by Magpie. The creators of the dataset plan to add more Magpie data once more computing resources are found. In addition, the QwQ-LongCoT-130K dataset contains longer instances in length distribution compared to the top_300k_longer_conversations subset of Magpie-Ultra.

One of the challenges in building the QwQ-LongCoT-130K dataset is how to curate seed instructions that are truly worthy of long-chain thinking reasoning. The creators of the dataset do not want the generated responses to be simple questions, such as "What color is the sky?", and also hope that these responses can avoid copyright issues. Therefore, the seed instructions of the dataset are collected in two ways: one part of the data comes from the NuminaMath-CoT dataset, which contains 860,000 math problems and their answers, and the other part is extracted from the QwQ-32B-Preview model through the Magpie method.

QwQ-LongCoT-130K.torrent
Seeding 2Downloading 0Completed 94Total Downloads 128
  • QwQ-LongCoT-130K/
    • README.md
      2.08 KB
    • README.txt
      4.16 KB
      • data/
        • QwQ-LongCoT-.zip
          357.27 MB