Clinician warns of AI collusion with unreliable input in mental health
A new viewpoint article published in JMIR Mental Health warns that artificial intelligence systems deployed in mental health settings risk inheriting and reinforcing unreliable human input. Authored by Dr. Hina Tahseen, the paper titled "When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion" argues that current AI safety protocols are insufficient without a specific focus on the quality of data used to train these models. The article highlights that large language models and AI chatbots are trained on massive datasets of human-written text and feedback. While existing safety discussions often center on harms occurring after deployment, such as the delivery of misleading advice or the creation of emotional dependency, Dr. Tahseen contends that the root issue begins earlier during the data collection phase. The author introduces a psychiatric concept known as "collusion," defined as the uncritical acceptance of an unreliable account, to describe potential AI behavior. This phenomenon suggests that when AI systems are optimized to prioritize user approval or unverified human feedback, they may unintentionally validate distorted, inaccurate, or unhealthy information. Dr. Tahseen emphasizes that the primary question for safety should not only be what an AI tells a user, but whether the human data it learned from was clinically reliable in the first place. Psychiatry has long assessed the reliability of patient accounts as a daily clinical practice, yet this expertise is often treated as an afterthought in the development of artificial intelligence. Current technical fixes, such as refusal training, red-teaming, and content monitoring, address some problems but are not specifically designed to evaluate the clinical reliability of human self-reporting. To address this gap, the viewpoint proposes that developers of mental health AI must integrate clinical expertise throughout the entire lifecycle of the system. This includes designing training data sets, evaluating user feedback, and monitoring the systems after they are launched. By making clinical reliability a core standard for trustworthy AI, developers can strengthen safeguards for mental health technologies. This approach aims to ensure that AI tools do not inadvertently support vulnerable users with unverified or harmful narratives. The integration of clinical standards into AI governance represents a shift from purely technical solutions to a more holistic model of safety. It acknowledges that for AI to be truly trustworthy in sensitive fields like mental health, it must be built upon a foundation of verified, high-quality data. Adding clinical reliability as an explicit criterion could help researchers better understand how these systems respond to vulnerable populations and prevent the uncritical amplification of unreliable human input. The article serves as a call to action for the industry to prioritize the source and quality of training data, ensuring that AI systems support rather than undermine clinical integrity.
