HyperAIHyperAI

Command Palette

Search for a command to run...

CHIMERA General Inference Synthetic Dataset

Date

3 hours ago

Paper URL

2603.00889

License

Apache 2.0

CHIMERA is a synthetic inference dataset designed specifically for inference training; related research papers include... CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning This dataset covers a wide range of STEM subjects and provides long chain thinking (CoT) trajectories.

This dataset contains 9,225 questions across 8 subjects (mathematics, computer science, chemistry, physics, literature, history, biology, and phonetics). All examples are generated by a large language model (LLM) and are automatically validated without manual annotation.

Discipline Distribution:

  • Mathematics: 4,452
  • Computer Science: 1,303
  • Chemistry: 1,102
  • Physics: 742
  • Literature: 504
  • History: 422
  • Biology: 383
  • Linguistics: 317

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp