MUSTARDSAUCE Mathematical Theorem Problem Dataset
Date
Size
Publish URL
Categories

Researchers from City University of Hong Kong, Sun Yat-sen University, Huawei Noah's Ark Lab and other institutions proposed a unified mathematical reasoning data synthesis framework MUSTARD, which can generate large amounts of high-quality mathematical reasoning data that is correct, human-readable and understandable.This dataset is the open source MUSTARDSAUCE dataset for research.Each of these data contains the problem description and multi-step solution in natural language, as well as the problem description and multi-step solution in the dual formal language Lean 3. The MUSTARDSAUCE data includes math word problems and theorem proof problems, covering difficulty levels from elementary school to higher education. The number of reasoning steps in the questions increases with the difficulty of the questions. The most difficult questions require about 30 steps to solve, about 20 Lean 3 tactics.