HyperAIHyperAI

APM Protein Generation Dataset

Date

3 months ago

Size

9.06 GB

Organization

Chinese Academy of Sciences
ByteDance Seed

Publish URL

zenodo.org

License

其他

This dataset is a protein generation dataset released in 2025 by Hunan University, University of Chinese Academy of Sciences, and ByteDance Seed Team. The related paper results are "An All-Atom Generative Model for Designing Protein Complexes".

Dataset composition

  • Single-chain protein dataset: contains 187,494 samples, covering a variety of protein types and functions, from PDB (18,684), Swiss-Prot (140,769), AFDB (28,041) databases.
  • Multi-chain protein dataset: contains 11,620 samples, covering 2-6 chain protein complexes, supporting multi-chain modeling. The data is derived from PDB biological assembly data, excluding 3 types of samples: samples in the SAbDab antibody database, samples containing chains less than 30 in length (considered as peptides), samples with a length greater than 2,048 or lacking cluster IDs. The researchers randomly trimmed the multi-chain samples during training: samples with more than 384 residues were centered on the interchain binding interface residue pairs, retaining the nearest 384 amino acids.

APM.torrent
Seeding 1Downloading 0Completed 38Total Downloads 101
  • APM/
    • README.md
      1.67 KB
    • README.txt
      3.34 KB
      • data/
        • APM.zip
          9.06 GB