P-MMEval Multi-language Multi-task Benchmark Dataset
Date
Size
Publish URL
*This dataset supports online use.Click here to jump.
The P-MMEval dataset is a large-scale multilingual multi-task benchmark dataset created by Alibaba Group Tongyi Lab in 2024, which aims to comprehensively evaluate the multilingual capabilities of large language models (LLMs).P-MMEVAL: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs"
The dataset contains 3 basic natural language processing (NLP) datasets and 5 advanced capability-specific datasets, covering tasks such as code generation, knowledge understanding, mathematical reasoning, logical reasoning, and instruction following. Through expert translation review, P-MMEval ensures consistent coverage of 10 languages and provides parallel samples across languages. These languages include English, Chinese, Arabic, Spanish, Japanese, Korean, Thai, French, Portuguese, and Vietnamese.