Command Palette
Search for a command to run...
P-MMEval multi-language multi-task Benchmark Dataset
Date
Size
Paper URL
*This dataset supports online use.Click here to jump.
The P-MMEval dataset is a large-scale multilingual multi-task benchmark dataset created by Alibaba Group Tongyi Lab in 2024, which aims to comprehensively evaluate the multilingual capabilities of large language models (LLMs).P-MMEVAL: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs"
The dataset contains 3 basic natural language processing (NLP) datasets and 5 advanced capability-specific datasets, covering tasks such as code generation, knowledge understanding, mathematical reasoning, logical reasoning, and instruction following. Through expert translation review, P-MMEval ensures consistent coverage of 10 languages and provides parallel samples across languages. These languages include English, Chinese, Arabic, Spanish, Japanese, Korean, Thai, French, Portuguese, and Vietnamese.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.