U-MATH Mathematical Reasoning Dataset
Date
Size
Publish URL
Categories
* This dataset supports online use.Click here to jump.
The U-MATH dataset is a comprehensive benchmark test set specifically designed to evaluate the mathematical reasoning capabilities of large language models (LLMs). This dataset was created by Toloka AI and Gradarius in 2024. The relevant paper results are "U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMsThis dataset contains 1,100 unpublished college-level math problems that are derived from real teaching materials and cover six core math topics: elementary mathematics, algebra, differential calculus, integral calculus, multivariable calculus, and sequences and series.
A notable feature of the U-MATH dataset is the multimodal questions it contains. About 20% of the questions involve visual elements such as graphs and charts, which increases the complexity of data processing and requires the model to be able to interpret and reason about graphical information. The features of the dataset include question ID, topic labels, whether it contains images, image data, question statements, and correct answers, which provide a comprehensive evaluation basis for the mathematical reasoning ability of the model.