HyperAI

pyMethods2Test Programming Language Processing Dataset

The pyMethods2Test dataset was created by researchers at the University of Nebraska–Lincoln in 2025. It contains a large number of open source unit test methods and corresponding focus maps, and is designed to generate effective unit test cases for Python code, filling the gap in the Python language in terms of large test datasets.pyMethods2Test: A Dataset of Python Tests Mapped to Focal Methods", which is widely used to train large language models (LLMs) to generate good Python unit test cases, providing LLMs with rich training data so that they can learn how to generate tests for Python code.

The dataset is constructed by mining 88,846 Python projects on GitHub that use the Pytest and unittest frameworks, and a collection of 22,662,037 test methods and 2,198,378 focus maps is constructed.

The dataset contains more than 22 million mappings of test methods to focus methods, and provides detailed context information for each mapping, such as test file path, focus file path, class name, method name, line number, etc. It is stored in JSON format for easy processing; and a script for generating focus method context is also provided.

The data is stored in two ZIP files. If you only want to use the pre-mined focus data, unzip focal-data.zip file (about 2 GB after decompression). Larger raw-data.zip The file (about 42 GB after decompression) contains the raw data used to generate the focus data, such as classes and methods extracted from the repository.

pyMethods2Test.torrent
Seeding 2Downloading 2Completed 34Total Downloads 55
  • pyMethods2Test/
    • README.md
      2.14 KB
    • README.txt
      4.29 KB
      • data/
        • pyMethods2Test.zip
          3.74 GB