HyperAI

Bespoke Stratos 17k Reasoning Task Dataset

Date

2 months ago

Size

107.46 MB

Organization

Publish URL

huggingface.co

*This dataset supports online use.Click here to jump.

Bespoke-Stratos-17k is a high-quality dataset designed for reasoning tasks, developed by the Bespoke Labs team in 2025. The relevant blog is "Bespoke-Stratos: The unreasonable effectiveness of reasoning distillation". This dataset is generated by improving Berkeley's Sky-T1 data pipeline and leveraging the distilled data of DeepSeek-R1, and is designed to support the training of high-performance inference models. The dataset contains questions, reasoning traces, and answers, covering multiple areas such as code, mathematics, and scientific puzzles. By using the Bespoke Curator tool, a high-quality reasoning dataset can be generated in just 1.5 hours, and the cost is controlled at around US$800. This dataset uses DeepSeek-R1 as the teacher reasoning model, which simplifies the data generation process without the need for additional formatting steps. In addition, by filtering incorrect math solutions with gpt-4o-mini, the retention rate of correct math solutions has been significantly improved, from 25% to 73%.

The dataset consists of 3 parts: programming data (5,000 data from APPs and TACO), mathematics data (10,000 data from the AIME, MATH and Olympiads subsets of the NuminaMATH dataset), and science and puzzle data (1,000 data from STILL-2). These data are used to train two reasoning models, Bespoke-Stratos-32B and Bespoke-Stratos-7B, which perform well in mathematics and code reasoning benchmarks, surpassing previous models.

Bespoke-Stratos-17k.torrent
Seeding 2Downloading 1Completed 42Total Downloads 40
  • Bespoke-Stratos-17k/
    • README.md
      2.05 KB
    • README.txt
      4.09 KB
      • data/
        • Bespoke-Stratos-17k.zip
          107.46 MB