ProteinGym Protein Mutation Dataset
The dataset contains a total of approximately 1.5 million missense variants from 87 DMS sequencing experiments.
paper"Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning"Using this dataset as a benchmark dataset, the results have been published in Nature Communications, a subsidiary of Nature