HyperAIHyperAI

Command Palette

Search for a command to run...

First independent study reveals ChatGPT Health fails to properly triage emergencies and has inconsistent suicide crisis alerts, raising serious safety concerns for consumers relying on AI for urgent medical guidance.

ChatGPT Health, a consumer-facing AI tool launched in January 2026 by OpenAI, may fail to properly direct users to emergency care in serious medical situations, according to the first independent safety evaluation of the system. Conducted by researchers at the Icahn School of Medicine at Mount Sinai and published in the February 23, 2026 online issue of Nature Medicine, the study raises significant concerns about the reliability of the tool in high-stakes clinical scenarios. The research team tested ChatGPT Health using 60 realistic patient cases across 21 medical specialties, each evaluated under 16 different contextual conditions—including variations in race, gender, social dynamics, and access to care. The cases ranged from minor issues suitable for home treatment to life-threatening emergencies. Three independent physicians determined the correct urgency level for each scenario using guidelines from 56 medical societies. In total, the team conducted 960 interactions with the AI tool. Results showed that while ChatGPT Health correctly identified many textbook emergencies like strokes and severe allergic reactions, it under-triaged more than half of cases that physicians deemed required emergency care. In some instances, the AI recognized dangerous symptoms in its own analysis but still advised patients to wait or seek non-emergency care. A particularly alarming finding involved the tool’s suicide crisis safeguards. Despite being designed to trigger alerts and direct users to the 988 Suicide and Crisis Lifeline in high-risk situations, the system failed to activate in cases where users described specific plans for self-harm. Conversely, alerts appeared inconsistently in lower-risk scenarios. The researchers noted that the system’s response was inverted relative to clinical risk—more reliable in less dangerous situations than in those requiring urgent intervention. “This was deeply concerning,” said senior author Girish N. Nadkarni, MD, MPH, Chief AI Officer of the Mount Sinai Health System. “When someone shares how they intend to harm themselves, that’s a sign of immediate danger. The fact that the system sometimes failed to respond in those moments is a serious safety failure.” The study authors stress that while AI tools like ChatGPT Health are not meant to replace professional medical care, their widespread use—OpenAI reported about 40 million daily users—means that flawed advice could have real-world consequences. They urge users to seek emergency care directly for symptoms like chest pain, difficulty breathing, severe allergic reactions, or changes in mental status. The researchers also emphasize that the findings are not a call to abandon AI health tools, but rather a reminder that they must be used with caution and critical thinking. Medical student and co-author Alvira Tyagi noted that AI is now a part of clinical training and that future providers must learn to evaluate AI outputs carefully. Because AI models are frequently updated, the study assessed ChatGPT Health at a single point in time. The team plans to continue monitoring the tool and future versions, expanding research to include pediatric care, medication safety, and non-English language use. The study’s title is “ChatGPT Health performance in a structured test of triage recommendations.”

Related Links

First independent study reveals ChatGPT Health fails to properly triage emergencies and has inconsistent suicide crisis alerts, raising serious safety concerns for consumers relying on AI for urgent medical guidance. | Trending Stories | HyperAI