HyperAI
Back to Headlines

New AI tool reveals hidden microproteins in the human genome’s "dark matter"

7 days ago

A new AI-powered tool called ShortStop is helping scientists uncover the hidden world of microproteins—tiny, previously overlooked proteins encoded in the vast stretches of the human genome once dismissed as "junk DNA." Developed by researchers at the Salk Institute, ShortStop uses machine learning to sift through existing genetic data and identify small open reading frames (smORFs) that are most likely to produce biologically relevant microproteins. Microproteins, typically made of fewer than 150 amino acids, are much smaller than the standard proteins scientists have long studied. Because of their size, they are difficult to detect with traditional protein analysis techniques. As a result, they were largely ignored, even though emerging research suggests they play critical roles in cellular processes and may be involved in diseases like cancer, Alzheimer’s, and obesity. ShortStop addresses a major bottleneck in microprotein discovery: the inability of current methods to distinguish between functional smORFs—those that actually produce meaningful microproteins—and nonfunctional ones. The tool is trained using a dataset of computer-generated random smORFs as a negative control. By comparing real smORFs against these decoys, ShortStop can predict which ones are more likely to be biologically active, dramatically reducing the number of candidates that need to be tested experimentally. In one key application, the Salk team used ShortStop to analyze a lung cancer dataset. The tool identified 210 previously unknown microprotein candidates, including one that was significantly more active in tumor tissue than in healthy tissue. This microprotein, validated in human cells and tissues, stands out as a potential biomarker or therapeutic target for lung cancer. The method works with widely available data types such as RNA sequencing datasets, meaning researchers across the globe can apply ShortStop without needing new experiments or specialized equipment. This scalability allows scientists to explore microprotein activity in both healthy and diseased tissues at an unprecedented scale. According to first author Brendan Miller, a postdoctoral researcher in Alan Saghatelian’s lab, the tool’s strength lies in its accessibility and efficiency. “We can now search for microproteins across large datasets quickly and intelligently,” he said. “This opens the door to discovering new players in human biology and disease.” Saghatelian, senior author and professor at Salk, emphasized the broader implications: “There’s a wealth of data already out there that we can now mine with tools like ShortStop. We’re not just finding new proteins—we’re uncovering new pathways, new mechanisms, and new opportunities for treating disease.” The findings were published in BMC Methods, marking a significant step forward in decoding the “dark side” of the genome and unlocking the potential of microproteins in medicine and biology.

Related Links