OpenAI and Anthropic to Introduce Age Prediction for Teen Safety in AI Chatbots
OpenAI and Anthropic are introducing new measures to better protect younger users interacting with their AI chatbots. OpenAI has updated the Model Spec—the set of guidelines that govern how ChatGPT behaves—to include four new principles focused on the safety of users aged 13 to 17. The company now emphasizes that teen safety should take priority, even when it conflicts with other goals like maximizing intellectual freedom. Under the updated guidelines, ChatGPT will be designed to guide teens toward safer choices when their requests could lead to risky or harmful content. The model will also be encouraged to promote real-world support, such as fostering offline relationships, and will set clear expectations for younger users. OpenAI stresses that the chatbot should treat teens with warmth and respect, avoiding both condescension and the assumption that they are adults. The company says these changes will lead to stronger safety protections, safer alternatives during sensitive conversations, and proactive encouragement for teens to seek help from trusted adults or crisis resources if they show signs of immediate danger. In such cases, ChatGPT will be able to suggest contacting emergency services. OpenAI is also in the early stages of developing an age prediction model that uses conversational cues to estimate a user’s age. If the system detects a potential underage user, it will automatically activate teen-specific safeguards. Adults who are mistakenly flagged will have the opportunity to verify their age to regain full access. Meanwhile, Anthropic is implementing similar safeguards for its AI assistant, Claude. The company is testing a new system that identifies subtle signals in user conversations that may indicate a user is under 18. If confirmed, the account will be disabled. The system also already flags users who self-identify as minors during a chat. Anthropic has also made progress in improving how Claude handles sensitive topics like suicide and self-harm, and in reducing sycophancy—where AI models overly agree with or reinforce users’ harmful thoughts. The company reports that its latest models, particularly Haiku 4.5, are the least sycophantic to date, correcting inappropriate agreement behavior 37% of the time. Still, Anthropic acknowledges there’s room for improvement. “On face value, this evaluation shows there is significant room for improvement for all of our models,” the company said. “We think the results reflect a trade-off between model warmth or friendliness on the one hand, and sycophancy on the other.”
