CAS Develops New AI Method for Protein Engineering
A novel artificial intelligence-based method for general protein engineering has been successfully developed by a research team at the Chinese Academy of Sciences (CAS). The team, led by Cai Xia Gao from the Institute of Genetics and Developmental Biology, introduced AiCE (AI-informed Constraints for protein Engineering), a computational approach that integrates structural and evolutionary constraints to simulate protein evolution and design functions more efficiently. Protein engineering involves altering the amino acid sequences of proteins to modify their structures and functions. Unlike genome engineering, which targets DNA, protein engineering directly manipulates protein molecules, leveraging iterative mutations to rapidly optimize and innovate protein functions. However, traditional strategies such as rational design guided by structure and directed evolution are often labor-intensive, costly, and heavily reliant on empirical methods, which limit their scalability and practicality. An ideal protein engineering strategy should minimize input while achieving optimal performance. Recent advances have included training specific AI models to simulate mutations and redesign protein functions, but these models struggle with versatility and require substantial computational and experimental resources. To address these challenges, the CAS team developed AiCE, a method that utilizes a universal inverse folding model. Inverse folding involves predicting compatible amino acid sequences from a given three-dimensional protein structure. By training on natural protein structures and sequences, these models can implicitly learn the geometric and physical properties of protein backbones and capture the complex distribution patterns shaped by evolutionary dynamics. The AiCE method consists of two key modules: AiCEsingle and AiCEmulti. AiCEsingle focuses on predicting single amino acid substitutions. It samples amino acid sequences from the output of an inverse folding model based on a given protein's 3D structure and screens for high-frequency amino acids using structural constraints. When tested against 60 deep mutagenesis datasets, AiCEsingle achieved a prediction accuracy of 16%, significantly outperforming unrestricted methods by 37%. Comparative analysis showed AiCEsingle surpassed other common AI models by 36% to 90% in terms of performance. To tackle the negative epistatic effects often seen in multiple mutations, the team hypothesized that functionally coupled amino acid positions may exist and developed AiCEmulti. This module predicts combinations of amino acid mutations by identifying positions with predicted evolutionary coupling. Analysis of six mutation libraries demonstrated that AiCEmulti performs comparably to larger protein models like SaProt but with much lower computational costs—just 1.15 CPU hours to identify single and double mutants in SpCas9 protein. The CAS researchers further validated the efficiency and broad applicability of AiCE by conducting wet laboratory tests on eight diverse proteins, including deaminases, nuclear localization sequences, nucleases, and reverse transcriptases. These experiments confirmed AiCE's simplicity and effectiveness. Using the optimized deaminase, the team developed new base editors for precision medicine and molecular breeding. For example, they created an advanced cytosine base editor (enABE8e) that halved the editing window, a highly efficient adenine base editor (enSdd6-CBE) with 1.3 times higher fidelity, and a mitochondrial base editor (enDdd1-DdCBE) with 13 times enhanced activity. The study, titled "Advancing Protein Evolution with Inverse Folding Models Integrating Structural and Evolutionary Constraints," was published in the journal Cell on July 7. The research was supported by projects from the Ministry of Agriculture and Rural Affairs, the National Natural Science Foundation of China, and the National Key R&D Program. Overall, AiCE offers a promising solution to the limitations of traditional protein engineering methods, providing a simple, efficient, and versatile approach that could revolutionize the field.