HyperAI

MIT scientists, alongside researchers from King Abdullah University of Science and Technology and the company HUMAIN, have created MathNet, the world's largest collection of Olympiad-level math problems. This initiative makes over 30,000 expert-authored problems and solutions available to the public for the first time. The dataset spans 47 countries, 17 languages, and 143 competitions over four decades, making it five times larger than any existing comparable resource. Previous datasets focused heavily on competitions from the United States and China. In contrast, MathNet captures global mathematical traditions by including text and image-based problems from six continents. The team sourced 1,595 PDF volumes totaling more than 25,000 pages, a significant portion of which came from a personal archive maintained by IMO community figure Navid Safaei since 2006. Unlike other collections that rely on informal community forums, MathNet draws exclusively from official national competition booklets, ensuring that solutions are peer-reviewed, multi-page, and written by experts. Shaden Alshammari, an MIT PhD student and lead author, emphasized that the goal is to provide a centralized resource for students who often train in isolation. The dataset has been validated by a team of over 30 human evaluators from countries including Armenia, Russia, Ukraine, Vietnam, and Poland. The researchers are also coordinating with the IMO foundation to share the data directly with the organization. Beyond its utility for students, MathNet serves as a rigorous benchmark for artificial intelligence. While some frontier models have achieved gold-medal performance on standard tests, MathNet reveals uneven capabilities. The top-performing model, GPT-5, averaged 69.3 percent accuracy on the main benchmark, failing nearly one-third of the problems. Performance drops significantly when problems include figures, highlighting visual reasoning as a persistent weakness. Furthermore, several open-source models achieved zero percent on Mongolian-language problems, exposing a lack of diversity in current AI systems compared to human models which perform equally well across languages. The dataset also tests whether AI can recognize when two problems share the same underlying mathematical structure. Testing state-of-the-art embedding models revealed that they identified correct matches only about 5 percent of the time on the first try, often ranking unrelated problems as more similar than equivalent ones. Retrieval-augmented generation tests showed that while providing relevant structural examples improved model performance by up to 12 percentage points, irrelevant retrieval degraded results in roughly 22 percent of cases. MathNet is set to be presented at the International Conference on Learning Representations in Brazil. By offering a diverse, high-quality, and standardized collection of global math problems, the project aims to improve mathematical reasoning in both human learners and AI systems, ensuring that no single cultural perspective dominates the training data.

Related Links

Related Links

Related Links

CVEvolve, a Zero-code, self-discovery Scientific Image Processing Algorithm Proposed by Argonne National Laboratory, Possesses full-stack Capabilities Including Coding, Result Self-checking, and Strategy optimization.

CVEvolve, a Zero-code, self-discovery Scientific Image Processing Algorithm Proposed by Argonne National Laboratory, Possesses full-stack Capabilities Including Coding, Result Self-checking, and Strategy optimization.

Command Palette

MIT scientists open world's largest Olympiad math collection

Related Links

Command Palette

MIT scientists open world's largest Olympiad math collection

Related Links

Command Palette

MIT scientists open world's largest Olympiad math collection

Related Links

CVEvolve, a Zero-code, self-discovery Scientific Image Processing Algorithm Proposed by Argonne National Laboratory, Possesses full-stack Capabilities Including Coding, Result Self-checking, and Strategy optimization.

CVEvolve, a Zero-code, self-discovery Scientific Image Processing Algorithm Proposed by Argonne National Laboratory, Possesses full-stack Capabilities Including Coding, Result Self-checking, and Strategy optimization.