HyperAI

PolyMath is a multilingual mathematical reasoning evaluation dataset released in 2025 by Alibaba's Qianwen team in collaboration with Shanghai Jiao Tong University. The related research paper is titled "...".PolyMath: Evaluating Mathematical Reasoning in Multilingual ContextsThe study has been selected for NeurIPS 2025 Datasets and Benchmarks, aiming to systematically evaluate the mathematical understanding, reasoning depth and cross-linguistic consistency performance of large language models under multilingual conditions.

This dataset contains 500 high-quality mathematical reasoning questions, with 125 questions provided for each difficulty level. It covers 18 languages and 4 difficulty levels, including 18 parallel language versions that cater to both high-resource and low-resource languages, covering more than 751,000 native speakers worldwide. The difficulty range extends from basic K-12 mathematics to Olympiad and cutting-edge mathematical fields, thus constructing a high-quality, multi-dimensional, and highly discriminative mathematical reasoning evaluation system.

Dataset distribution:

Number and distribution of questions: Each language offers 125 questions at each difficulty level, forming a balanced difficulty composition.
Difficulty classification criteria: Divided into four levels based on "Thought Depth" and "Knowledge Breadth":
- Level 1: Basics (K–12)
- Level 2: Advanced (High School to Upper Grades)
- Level 3: High difficulty (Olympiad level)
- Level 4: Cutting Edge (Advanced Mathematics and Research-Level Reasoning)

Dataset distribution:

Number and distribution of questions: Each language offers 125 questions at each difficulty level, forming a balanced difficulty composition.
Difficulty classification criteria: Divided into four levels based on "Thought Depth" and "Knowledge Breadth":
- Level 1: Basics (K–12)
- Level 2: Advanced (High School to Upper Grades)
- Level 3: High difficulty (Olympiad level)
- Level 4: Cutting Edge (Advanced Mathematics and Research-Level Reasoning)

Dataset distribution:

Number and distribution of questions: Each language offers 125 questions at each difficulty level, forming a balanced difficulty composition.
Difficulty classification criteria: Divided into four levels based on "Thought Depth" and "Knowledge Breadth":
- Level 1: Basics (K–12)
- Level 2: Advanced (High School to Upper Grades)
- Level 3: High difficulty (Olympiad level)
- Level 4: Cutting Edge (Advanced Mathematics and Research-Level Reasoning)

PolyMath Multilingual Mathematical Reasoning Benchmark Dataset

Dataset distribution:

Build AI with AI

Hyper Newsletters

PolyMath Multilingual Mathematical Reasoning Benchmark Dataset

Dataset distribution:

Build AI with AI

Hyper Newsletters

PolyMath Multilingual Mathematical Reasoning Benchmark Dataset

Dataset distribution:

Build AI with AI

Hyper Newsletters

Command Palette

PolyMath Multilingual Mathematical Reasoning Benchmark Dataset

Dataset distribution:

Build AI with AI

Hyper Newsletters

Command Palette

PolyMath Multilingual Mathematical Reasoning Benchmark Dataset

Dataset distribution:

Build AI with AI

Hyper Newsletters

Command Palette

PolyMath Multilingual Mathematical Reasoning Benchmark Dataset

Dataset distribution:

Build AI with AI

Hyper Newsletters