MIT Unlocks How Protein AI Models Predict, Boosting Drug Discovery
Researchers at MIT have made a significant breakthrough in understanding how protein language models make their predictions, shedding light on the previously opaque inner workings of these powerful AI systems. These models, which are based on large language models (LLMs), have become essential tools in biology, enabling accurate predictions of protein structure and function for applications like drug discovery and vaccine development. However, despite their success, scientists have long struggled to understand how these models arrive at their conclusions—essentially treating them as uninterpretable "black boxes." In a new study published in the Proceedings of the National Academy of Sciences, MIT researchers led by Bonnie Berger, the Simons Professor of Mathematics and head of the Computation and Biology group at the Computer Science and Artificial Intelligence Laboratory, developed a novel method to open that black box. The study’s lead author is MIT graduate student Onkar Gujral, with contributions from Mihir Bafna and Eric Alm. The team applied a technique known as a sparse autoencoder—a type of algorithm increasingly used to interpret LLMs—to protein language models for the first time. These models typically represent a protein’s amino acid sequence through a dense pattern of neural activations, often involving around 480 nodes. Because each node is activated by multiple features, it becomes nearly impossible to determine what specific protein characteristics a given node is tracking. The sparse autoencoder approach expands this representation to tens of thousands of nodes—up to 20,000—while enforcing a sparsity constraint. This forces the model to represent information more efficiently, allowing individual features to be isolated in single, distinct nodes. As a result, each node becomes more interpretable, clearly tied to a specific biological property such as protein family, cellular location, or molecular function. To analyze these sparse representations, the researchers used an AI assistant called Claude, which compared the neural patterns with known biological features. The AI then generated plain-language descriptions of what each node was detecting—for example, identifying a neuron that responds specifically to proteins involved in ion transport across the plasma membrane. The findings revealed that the most commonly encoded features were protein families and key metabolic and biosynthetic processes. Importantly, the interpretability emerged naturally from the sparsity constraint, even though the algorithm wasn’t explicitly designed for it. This breakthrough could help biologists and AI researchers select the most appropriate models for specific tasks, improve input design, and even uncover new biological insights from model behavior. As Gujral noted, “At some point when the models get a lot more powerful, you could learn more biology than you already know, from opening up the models.” The research was supported by the National Institutes of Health and marks a major step forward in making AI-driven biological discovery more transparent, reliable, and scientifically informative.