HyperAIHyperAI

P-MMEval multi-language multi-task Benchmark Dataset

Date

10 months ago

Size

12.72 MB

Organization

Publish URL

huggingface.co

Paper URL

arxiv.org

*This dataset supports online use.Click here to jump.

The P-MMEval dataset is a large-scale multilingual multi-task benchmark dataset created by Alibaba Group Tongyi Lab in 2024, which aims to comprehensively evaluate the multilingual capabilities of large language models (LLMs).P-MMEVAL: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs"

The dataset contains 3 basic natural language processing (NLP) datasets and 5 advanced capability-specific datasets, covering tasks such as code generation, knowledge understanding, mathematical reasoning, logical reasoning, and instruction following. Through expert translation review, P-MMEval ensures consistent coverage of 10 languages and provides parallel samples across languages. These languages include English, Chinese, Arabic, Spanish, Japanese, Korean, Thai, French, Portuguese, and Vietnamese.

P-MMEval.torrent
Seeding 1Downloading 0Completed 96Total Downloads 128
  • P-MMEval/
    • README.md
      1.48 KB
    • README.txt
      2.97 KB
      • data/
        • P-MMEval.zip
          12.72 MB
P-MMEval multi-language multi-task Benchmark Dataset | Datasets | HyperAI