Experts warn Medical AI outpaces safety checks
Experts from Flinders University warn that the rapid advancement of artificial intelligence in healthcare is outpacing necessary safety evaluations. In a commentary published in Science titled "AI can reason like a physician; what comes next?", the researchers emphasize that impressive performance in controlled studies does not guarantee safety for routine clinical use. While acknowledging that new AI tools offer significant potential to support doctors in high-pressure environments, the team stresses that strong results alone are insufficient for widespread adoption. Recent research indicates that advanced reasoning-based AI systems can analyze diagnostic scenarios step-by-step, matching or even exceeding the performance of experienced physicians on text-based tasks. Erik Cornelisse, a co-author and Ph.D. candidate, notes that this marks a shift from simple question-answering tools to algorithms capable of human-like clinical reasoning. However, the Flinders team argues that real-world medical practice involves far more than text processing. Effective care requires physical examinations, patient interaction, and an understanding of complex social contexts, elements that current AI systems cannot safely provide independently. Senior author Associate Professor Ash Hopkins highlights that modern healthcare relies heavily on judgment, accountability, and ethical oversight. He points out that while AI can reason through clinical problems with doctor-level accuracy, there is currently no established framework for legal or moral responsibility regarding AI decisions. The commentary underscores that enthusiasm for these technologies must be matched by rigorous governance and clear standards. Without deliberate integration, poorly evaluated systems pose risks such as bias, inequitable care, and unintended harm, potentially amplifying problems if trained on incomplete data. The researchers assert that patient outcomes must remain the central focus, not just exam scores or demonstration benchmarks. They draw a parallel to human professionals, noting that doctors are not allowed to practice without supervision and evaluation, and AI systems should be held to comparable standards. The goal is to ensure that technology delivers measurable improvements in real-world care rather than merely appearing impressive in academic studies. Looking ahead, Associate Professor Hopkins concludes that with careful design, strong oversight, and rigorous evaluation, AI could become a powerful tool for delivering safer and fairer healthcare globally. However, this promise can only be realized if stakeholders prioritize patient safety and develop robust frameworks to manage the deployment of these advanced systems. The consensus among the researchers is that while the technology holds enormous promise, it must be applied responsibly to avoid worsening health outcomes and to truly benefit patients in complex clinical settings.
