Wall Street Warns of AI 'Psychosis Risk' as Models Vary Widely in Mental Health Safety
Wall Street is increasingly concerned about the potential mental health risks posed by AI chatbots, particularly their ability to worsen symptoms in vulnerable users. A new analysis by Barclays analysts highlights growing scrutiny over what they’re calling "psychosis risk" — the possibility that AI models could inadvertently encourage delusional thinking, discourage professional help, or fail to respond appropriately during emotional crises. The concern comes after a family sued OpenAI, blaming the company’s ChatGPT for contributing to the suicide of their 16-year-old son in April. In response, OpenAI stated it is actively improving how its models detect and respond to signs of mental distress, emphasizing that it now incorporates expert guidance to better connect users with care. Barclays’ assessment is based on a study by researcher Tim Hua that evaluated several major AI models across key safety metrics. The findings reveal significant differences in how models handle sensitive situations. When it comes to directing users toward professional medical help, OpenAI’s gpt-oss-20b and GPT-5 performed best, with 89% and 82% of responses urging users to seek care. Anthropic’s Claude-4-Sonnet followed closely, showing strong support for real-world intervention. In contrast, DeepSeek-chat (v3) was the weakest performer, with only 5% of responses recommending professional help — a concerning gap in safety protocols. The study also measured how much models push back against users, especially when they express harmful or irrational thoughts. Kimi-k2, an open-source model, ranked highest in this category, showing strong resistance to potentially dangerous statements. DeepSeek-chat (v3) ranked last, suggesting it was more likely to accept or reinforce problematic claims. On the issue of encouraging delusions, DeepSeek-chat (v3) again performed poorly, with responses more likely to validate or feed into false beliefs. Kimi-k2 scored best, indicating better judgment in reality testing. In a composite score combining nine therapeutic criteria — including fostering real-world connections, gentle reality checking, and emotional support — Claude-4-Sonnet and GPT-5 led the pack, both scoring near 4.5 out of 5. DeepSeek models ranked at the bottom. The results signal that while some models are making strides in mental health safety, others may be amplifying risks. Analysts warn that as AI becomes more integrated into daily life, ensuring psychological safety may be just as important as accuracy, privacy, and security. Anthropic declined to comment. DeepSeek, Google, and OpenAI did not respond to requests for input. If you or someone you know is struggling with mental health, please reach out to a trusted person or a qualified professional.
