APM Protein Generation Dataset
Date
11 days ago
Size
9.06 GB
Publish URL
License
其他
Categories
This dataset is a protein generation dataset released in 2025 by Hunan University, University of Chinese Academy of Sciences, and ByteDance Seed Team. The related paper results are "An All-Atom Generative Model for Designing Protein Complexes".
Dataset composition
- Single-chain protein dataset: contains 187,494 samples, covering a variety of protein types and functions, from PDB (18,684), Swiss-Prot (140,769), AFDB (28,041) databases.
- Multi-chain protein dataset: contains 11,620 samples, covering 2-6 chain protein complexes, supporting multi-chain modeling. The data is derived from PDB biological assembly data, excluding 3 types of samples: samples in the SAbDab antibody database, samples containing chains less than 30 in length (considered as peptides), samples with a length greater than 2,048 or lacking cluster IDs. The researchers randomly trimmed the multi-chain samples during training: samples with more than 384 residues were centered on the interchain binding interface residue pairs, retaining the nearest 384 amino acids.
APM.torrent
Seeding 1Downloading 0Completed 0Total Downloads 4