Command Palette
Search for a command to run...
CHIMERA General Inference Synthetic Dataset
CHIMERA is a synthetic inference dataset designed specifically for inference training; related research papers include... CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning This dataset covers a wide range of STEM subjects and provides long chain thinking (CoT) trajectories.
This dataset contains 9,225 questions across 8 subjects (mathematics, computer science, chemistry, physics, literature, history, biology, and phonetics). All examples are generated by a large language model (LLM) and are automatically validated without manual annotation.
Discipline Distribution:
- Mathematics: 4,452
- Computer Science: 1,303
- Chemistry: 1,102
- Physics: 742
- Literature: 504
- History: 422
- Biology: 383
- Linguistics: 317
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.