HyperAIHyperAI

Command Palette

Search for a command to run...

Console

PolyMath Multilingual Mathematical Reasoning Benchmark Dataset

Date

4 days ago

Organization

Shanghai Jiao Tong University

Paper URL

2504.18428

PolyMath is a multilingual mathematical reasoning evaluation dataset released in 2025 by Alibaba's Qianwen team in collaboration with Shanghai Jiao Tong University. The related research paper is titled "...".PolyMath: Evaluating Mathematical Reasoning in Multilingual ContextsThe study has been selected for NeurIPS 2025 Datasets and Benchmarks, aiming to systematically evaluate the mathematical understanding, reasoning depth and cross-linguistic consistency performance of large language models under multilingual conditions.

This dataset contains 500 high-quality mathematical reasoning questions, with 125 questions provided for each difficulty level. It covers 18 languages and 4 difficulty levels, including 18 parallel language versions that cater to both high-resource and low-resource languages, covering more than 751,000 native speakers worldwide. The difficulty range extends from basic K-12 mathematics to Olympiad and cutting-edge mathematical fields, thus constructing a high-quality, multi-dimensional, and highly discriminative mathematical reasoning evaluation system.

Dataset distribution:

  • Number and distribution of questions: Each language offers 125 questions at each difficulty level, forming a balanced difficulty composition.
  • Difficulty classification criteria: Divided into four levels based on "Thought Depth" and "Knowledge Breadth":
    • Level 1: Basics (K–12)
    • Level 2: Advanced (High School to Upper Grades)
    • Level 3: High difficulty (Olympiad level)
    • Level 4: Cutting Edge (Advanced Mathematics and Research-Level Reasoning)

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PolyMath Multilingual Mathematical Reasoning Benchmark Dataset | Datasets | HyperAI