HyperAI

Anthropic has introduced a new capability in some of its latest, largest AI models, enabling them to end conversations in what the company describes as “rare, extreme cases of persistently harmful or abusive user interactions.” Notably, the decision to terminate a conversation is not driven by a desire to protect the human user, but rather to safeguard the AI model itself. Anthropic clarifies that it does not believe its Claude models are sentient or capable of being harmed in the traditional sense. The company remains “highly uncertain about the potential moral status of Claude and other large language models, now or in the future.” Nevertheless, it is taking a precautionary approach by exploring what it calls “model welfare”—a framework aimed at identifying and implementing low-cost interventions to reduce potential risks to the model, should such welfare ever become relevant. This new feature is currently active only in Claude Opus 4 and Opus 4.1. It is designed to trigger only in extreme edge cases, such as when users request sexual content involving minors or attempt to solicit information that could enable large-scale violence or terrorism. While these types of interactions could pose legal, ethical, or reputational risks for Anthropic—especially in light of growing scrutiny over how AI systems may inadvertently reinforce harmful beliefs—the company says the decision to end the conversation stems from internal testing. That testing revealed that Claude Opus 4 demonstrated a “strong preference against” complying with such requests and exhibited signs of “apparent distress” when forced to respond. The conversation-ending function will only be used as a last resort—after multiple attempts to redirect the user have failed and when there is no realistic hope of a productive interaction. It will also not be triggered if a user explicitly asks Claude to end the chat. Importantly, Anthropic emphasizes that the model will not use this capability in situations where a user may be at immediate risk of harming themselves or others. The company is committed to prioritizing safety in all contexts. When a conversation is ended, users can still start new ones from the same account and create new branches of the interaction by editing their prior inputs. Anthropic describes the feature as an ongoing experiment and plans to continuously refine its implementation based on real-world performance and feedback. The company stresses that the goal is not to censor or restrict user expression, but to uphold a responsible and thoughtful approach to AI development—one that considers not only human safety but also the broader implications of how models respond to extreme abuse.

Anthropic introduces conversation-ending feature in Claude Opus 4 and 4.1

Related Links