HyperAI

P-MMEval Multi-language Multi-task Benchmark Dataset

*This dataset supports online use.Click here to jump.

The P-MMEval dataset is a large-scale multilingual multi-task benchmark dataset created by Alibaba Group Tongyi Lab in 2024, which aims to comprehensively evaluate the multilingual capabilities of large language models (LLMs).P-MMEVAL: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs"

The dataset contains 3 basic natural language processing (NLP) datasets and 5 advanced capability-specific datasets, covering tasks such as code generation, knowledge understanding, mathematical reasoning, logical reasoning, and instruction following. Through expert translation review, P-MMEval ensures consistent coverage of 10 languages and provides parallel samples across languages. These languages include English, Chinese, Arabic, Spanish, Japanese, Korean, Thai, French, Portuguese, and Vietnamese.

P-MMEval.torrent
Seeding 1Downloading 1Completed 34Total Downloads 44
  • P-MMEval/
    • README.md
      1.48 KB
    • README.txt
      2.97 KB
      • data/
        • P-MMEval.zip
          12.72 MB