HyperAIHyperAI

Command Palette

Search for a command to run...

AI-Powered Language Model LassoESM Deciphers Lasso Peptide Design for Next-Gen Therapeutics

Lasso peptides, a class of naturally occurring molecules produced by bacteria, are gaining attention for their unique knot-like structures that confer exceptional stability and diverse biological activities, including antibacterial, antiviral, and anticancer properties. These characteristics make them promising candidates for next-generation therapeutics. To unlock their full potential, researchers from the Carl R. Woese Institute for Genomic Biology have developed LassoESM, a specialized large language model designed specifically for lasso peptides. Unlike conventional protein language models such as AlphaFold, which struggle with lasso peptides due to their distinct structural features, LassoESM was built from the ground up to understand the unique sequence and folding patterns of these molecules. The model is based on the ESM-2 architecture but was further pre-trained using a domain-adaptive approach with masked language modeling, focusing exclusively on experimentally validated lasso peptide sequences. The research team, led by Doug Mitchell from the Vanderbilt Institute for Chemical Biology and Diwakar Shukla from the University of Illinois Urbana-Champaign, first compiled a comprehensive dataset of thousands of lasso peptide sequences through advanced bioinformatics and manual validation. This high-quality dataset was essential for training a model capable of capturing the nuances of lasso peptide biosynthesis. Lasso peptides are formed through a precise enzymatic process involving a leader peptidase, a RiPP recognition element (RRE), and a lasso cyclase, which ties the linear peptide into a stable slip-knot structure. The challenge lies in predicting which lasso cyclase can successfully act on a given peptide sequence—a task that is difficult to solve experimentally due to the vast number of possible combinations. LassoESM addresses this by learning the "language" of lasso peptides through sequence prediction tasks. By hiding parts of a peptide sequence and training the model to predict the missing segments, it learns the structural and functional rules governing lasso formation. The resulting embeddings enable accurate predictions across multiple downstream applications. The model successfully predicted lasso cyclase substrate tolerance, identified compatible pairs of non-cognate cyclases and peptides, and even forecasted RNAP inhibitory activity—key for potential antibiotic development. These predictions were validated with enrichment values, demonstrating the model’s high accuracy even with limited training data. “This tool allows us to move beyond trial-and-error experimentation,” said Shukla. “We can now rationally design lasso peptides with desired properties, significantly accelerating drug discovery.” Xuenan Mi, who completed her Ph.D. in Shukla’s lab and led the computational work, emphasized the model’s efficiency and adaptability. “LassoESM enables precise property prediction even when experimental data is scarce, making it a powerful asset for biomedical and industrial applications.” The team plans to expand the model to other classes of peptide natural products and explore ways to engineer lasso peptides for targeted protein interactions. Their success was made possible by strong interdisciplinary collaboration and access to advanced computing resources at the University of Illinois. Published in Nature Communications, the study highlights how AI-driven tools can bridge gaps in natural product discovery, paving the way for innovative therapeutics derived from nature’s molecular toolkit.

Related Links