Date

a year ago

Size

107.46 MB

Organization

Paper URL

www.bespokelabs.ai

Tags

Bespoke-Stratos-17k is a high-quality dataset designed for reasoning tasks, developed by the Bespoke Labs team in 2025. The relevant blog is "Bespoke-Stratos: The unreasonable effectiveness of reasoning distillation". This dataset is generated by improving Berkeley's Sky-T1 data pipeline and leveraging the distilled data of DeepSeek-R1, and is designed to support the training of high-performance inference models. The dataset contains questions, reasoning traces, and answers, covering multiple areas such as code, mathematics, and scientific puzzles. By using the Bespoke Curator tool, a high-quality reasoning dataset can be generated in just 1.5 hours, and the cost is controlled at around US$800. This dataset uses DeepSeek-R1 as the teacher reasoning model, which simplifies the data generation process without the need for additional formatting steps. In addition, by filtering incorrect math solutions with gpt-4o-mini, the retention rate of correct math solutions has been significantly improved, from 25% to 73%. The dataset consists of 3 parts: programming data (5,000 data from APPs and TACO), mathematics data (10,000 data from the AIME, MATH and Olympiads subsets of the NuminaMATH dataset), and science and puzzle data (1,000 data from STILL-2). These data are used to train two reasoning models, Bespoke-Stratos-32B and Bespoke-Stratos-7B, which perform well in mathematics and code reasoning benchmarks, surpassing previous models.

Bespoke-Stratos-17k.torrent

Seeding 1Downloading 0Completed 190Total Downloads 311

Bespoke-Stratos-17k/
- README.md
  2.05 KB
- README.txt
  4.09 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

a year ago

Size

107.46 MB

Organization

Paper URL

www.bespokelabs.ai

Related Datasets

Nemotron Personas France (French Synthetic Personas Dataset)

2 months ago

CHIMERA General Inference Synthetic Dataset

4 months ago

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

5 months ago

RoVid-X Robot Video Generation Dataset

2 months ago

DeepPlanning Long-Term Planning Capability Assessment Dataset

5 months ago

Patient Segmentation Dataset

5 months ago

Nemotron-Math-v2 Mathematical Inference Dataset

5 months ago

TxT360-3efforts Multi-Task Inference Dataset

6 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Bespoke Stratos 17k Reasoning Task Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

Bespoke Stratos 17k Reasoning Task Dataset

Related Datasets

Nemotron Personas France (French Synthetic Personas Dataset)

CHIMERA General Inference Synthetic Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

RoVid-X Robot Video Generation Dataset

DeepPlanning Long-Term Planning Capability Assessment Dataset

Patient Segmentation Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

TxT360-3efforts Multi-Task Inference Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

Bespoke Stratos 17k Reasoning Task Dataset

Related Datasets

Nemotron Personas France (French Synthetic Personas Dataset)

CHIMERA General Inference Synthetic Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

RoVid-X Robot Video Generation Dataset

DeepPlanning Long-Term Planning Capability Assessment Dataset

Patient Segmentation Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

TxT360-3efforts Multi-Task Inference Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

Nemotron Personas France (French Synthetic Personas Dataset)

CHIMERA General Inference Synthetic Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

RoVid-X Robot Video Generation Dataset

DeepPlanning Long-Term Planning Capability Assessment Dataset

Patient Segmentation Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

TxT360-3efforts Multi-Task Inference Dataset

Related Datasets

Nemotron Personas France (French Synthetic Personas Dataset)

CHIMERA General Inference Synthetic Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

RoVid-X Robot Video Generation Dataset

DeepPlanning Long-Term Planning Capability Assessment Dataset

Patient Segmentation Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

TxT360-3efforts Multi-Task Inference Dataset