Clinically informed AI outperforms foundation models in predicting spinal cord disease up to 30 months earlier, enabling earlier intervention and better outcomes.
A new study led by researchers at Washington University in St. Louis demonstrates that AI models informed by clinical expertise outperform large, general-purpose foundation models in predicting cervical spondylotic myelopathy (CSM), a common cause of spinal cord dysfunction in older adults. The findings, published in npj Digital Medicine, show that the clinically guided AI approach can identify patients at risk of CSM up to 30 months before a formal diagnosis—potentially enabling earlier intervention and better outcomes. CSM results from age-related arthritis in the neck that compresses the spinal cord, leading to progressive symptoms such as neck pain, weakness, and walking difficulties. Because early signs are often subtle and overlooked, diagnosis frequently occurs too late for optimal treatment. The research team, including surgeon-scientists, computer scientists, and data experts, developed an AI system designed to detect early warning signs in electronic health records (EHRs). Salim Yakdan, MD, a postdoctoral research fellow in the Taylor Family Department of Neurosurgery, and Ben Warner, a doctoral student in computer science and engineering, served as co-first authors. They analyzed EHR data from over 2 million individuals, using seven different AI models to identify patterns in healthcare interactions—such as prior diagnoses, imaging tests, and visits—that signal early CSM risk. Jacob Greenberg, MD, assistant professor of neurosurgery and a neurological spine surgeon, emphasized the clinical challenge: “We wanted to know if we could use EHR data to identify patients early enough—well before symptoms become severe—so we could intervene in time to improve outcomes.” The team tested models across two datasets: a large, diverse external dataset and a smaller, local dataset from a St. Louis health system. While large foundation models—pretrained on vast clinical data—performed well in internal validation, they struggled to generalize across different healthcare systems. In contrast, a smaller, custom-built model designed with clinical insights showed more consistent and reliable performance across both datasets. “This was surprising,” said Warner, who works in the lab of Chenyang Lu, the Fullgraf Professor and director of the AI for Health Institute, and co-senior author. “We found that a simpler model, grounded in real-world clinical understanding, outperformed more complex, data-heavy systems when applied to new environments.” The study highlights a key lesson: for complex, variable conditions like CSM, clinical knowledge is not just helpful—it’s essential. As Lu noted, “Generalizability is one of the biggest hurdles for AI in medicine. Our results show that embedding clinical expertise into AI models leads to more robust, trustworthy tools that work across different hospitals and patient populations.” Greenberg added, “AI has huge potential in healthcare, but we must go beyond data-driven approaches alone. Integrating clinical experience ensures models are not only accurate but also practical and reliable in real-world settings.” The findings suggest that future AI tools for disease prediction should balance data scale with clinical insight, particularly for conditions where early detection can dramatically improve patient outcomes.
