AI Tool DEGU Enhances Genomic Predictions with Better Accuracy and Explainability
Artificial intelligence is transforming biology, with deep neural networks (DNNs) becoming essential tools for predicting outcomes of genomic experiments. However, a major challenge remains: these models often provide answers without clear indications of their confidence or reliability. This lack of transparency limits their usefulness in scientific research, where certainty and interpretability are critical. Peter Koo, Associate Professor at Cold Spring Harbor Laboratory (CSHL), explains the problem: “Right now, many AI tools—whether large language models or DNNs in genomics—produce outputs in the same format, regardless of how certain they are. We need better ways to assess their confidence.” To address this, Koo, former CSHL postdoc Jessica Zhou, and graduate student Kaeli Rizzo have developed a new AI method called DEGU—Distilling Ensembles for Genomic Uncertainty-aware models. DEGU improves both the accuracy and interpretability of genomic predictions by enabling models to express uncertainty in their results. Traditional approaches often involve training multiple models—say, 10—and combining their predictions through a technique known as deep ensemble learning. While this increases reliability, it also demands significant computational resources and makes it difficult to understand why a model made a particular prediction. DEGU overcomes these limitations by using a technique called deep ensemble distribution distillation. Instead of relying on multiple models, DEGU distills the collective behavior of an ensemble into a single, compact model. This new model retains the predictive power of the original ensemble but is far more efficient—up to ten times smaller—while still offering reliable uncertainty estimates. “Instead of analyzing ten models at once, you’re working with one model that’s one-tenth the size but performs just as well,” Rizzo says. “And because it’s a single model, it’s much easier to trace what factors are influencing its predictions.” The team found that DEGU-trained models not only outperformed standard DNNs in accuracy but also provided clearer, more actionable explanations for their outputs. This is crucial for guiding real-world biological research, where experiments are costly and time-consuming. “Lab experiments are expensive,” Rizzo notes. “If we can make AI models more reliable and transparent, scientists can avoid chasing false leads and focus on hypotheses that are more likely to succeed.” The researchers are now working to refine DEGU’s efficiency and expand its accessibility to scientists across the globe. By reducing uncertainty and improving interpretability, DEGU has the potential to accelerate AI-driven discoveries in genomics and beyond—turning AI from a black box into a trusted partner in scientific exploration.
