HyperAIHyperAI

MathX-5M Mathematical Reasoning Dataset

Date

2 months ago

Publish URL

huggingface.co

License

MIT

Categories

Download Help

MathX is a mathematical reasoning dataset designed for instruction-based model tuning and fine-tuning of existing models to augment thinking capabilities. The dataset is the largest and most comprehensive public corpus of mathematical reasoning data to date.

The dataset includes 5 million carefully selected step-by-step thinking data examples, each of which contains: problem statement, detailed reasoning process, and verified correct solution. The examples cover arithmetic and number theory, algebra and polynomial mathematics, geometry and trigonometry, calculus and analysis.

Problem complexity distribution

  • Basic level (30%): Basic mathematical concepts and operations
  • Intermediate (30%): Multi-step problems requiring reasoning chains
  • Advanced (40%): Complex Mathematical Challenges and Proofs

Dataset features:

  • Diversity: Comprehensive coverage of mathematics from basic arithmetic to advanced calculus
  • Quality: Multi-stage screening and verification process
  • Reasoning: step-by-step solutions with detailed mathematical ideas
  • Accuracy: Answers verified by reinforcement learning and verified for correctness
MathX-5M Mathematical Reasoning Dataset | Datasets | HyperAI