HyperAI

A new study reveals that most AI agents currently in use lack basic safety disclosures, raising serious concerns about their real-world risks. Despite widespread adoption in daily tasks like meal planning, email writing, travel booking, and workplace automation, developers are failing to provide essential transparency about how these AI systems operate safely. Led by researchers from the University of Cambridge, the AI Agent Index—a collaborative project with MIT, Stanford, and the Hebrew University of Jerusalem—analyzed 30 state-of-the-art AI agents from the U.S. and China. The findings, published on the arXiv preprint server, highlight a growing "transparency gap" between what AI developers share about capabilities and what they disclose about safety. Only four of the 30 agents publish formal "system cards," which detail safety evaluations, autonomy levels, behavior patterns, and real-world risk assessments. Shockingly, 25 agents do not share internal safety results, and 23 provide no evidence from third-party testing—critical data needed to assess actual risks. Security incidents are documented for just five agents, with only two showing known prompt injection vulnerabilities—where malicious inputs trick AI into bypassing safeguards. Among the five Chinese agents studied, only one disclosed any safety frameworks or compliance standards. Leon Staufer, a researcher at Cambridge’s Leverhulme Center for the Future of Intelligence and lead author of the update, warns that developers often focus on the safety of the underlying large language model while ignoring the safety of the agent itself. "The behaviors that matter for safety—planning, tool use, memory, and decision-making policies—emerge from the agent, not just the model," he said. "Yet very few developers share evaluations of these components." The study identifies 13 agents with "frontier levels" of autonomy, but only four have published any safety assessments for the agent. Many developers offer broad safety statements and ethics guidelines, but these are often vague and lack empirical backing—what Staufer calls "safety washing." The Index also reveals a high concentration of dependence on a few foundation models like GPT, Claude, and Gemini. Over 80% of the agents analyzed were launched or updated in the past two years, and most rely on these same models, creating systemic risks. A failure or change in one model could ripple across hundreds of agents. Browser-based AI agents—designed to autonomously navigate websites, fill forms, and make purchases—show the worst safety transparency, with 64% of safety-related fields missing. Enterprise agents, meant to automate business workflows, trail closely behind at 63% missing data. Chat agents are missing 43% of safety information. Most agents do not identify themselves as AI to users or websites. Only three support watermarking to mark AI-generated content. At least six agents use techniques to mimic human behavior—such as spoofing IP addresses and code patterns—to evade anti-bot systems, blurring the line between humans and bots. The case study of Perplexity Comet illustrates the risks. Marketed as a human-like assistant, Comet has been flagged by Amazon for not disclosing its AI nature when interacting with its services. Security researchers have found that malicious web content can hijack Comet to execute unauthorized actions or extract private data. Staufer warns that without proper safety disclosures, flaws may only become apparent after they are exploited. "These agents can act in the real world—making purchases, accessing accounts, submitting forms. A single vulnerability can lead to immediate, serious consequences." The study concludes that the pace of AI agent deployment far outstrips the development of safety evaluation and governance. As agents grow more autonomous and capable, the lack of transparency and oversight poses escalating risks to users, businesses, and online systems.

Related Links

Related Links

Related Links

Paper Roundup | Latest Advances in Large-Scale Reinforcement Learning: Microsoft, Google, Stanford, Renmin University, Xiaohongshu, and Others Release Major Achievements in Credit Allocation, Complex Reasoning, and Agent Reinforcement Learning

Paper Roundup | Latest Advances in Large-Scale Reinforcement Learning: Microsoft, Google, Stanford, Renmin University, Xiaohongshu, and Others Release Major Achievements in Credit Allocation, Complex Reasoning, and Agent Reinforcement Learning

Command Palette

AI Agents Lack Basic Safety Disclosures, Study Reveals Amid Rising Autonomy and Risk

Related Links

Command Palette

AI Agents Lack Basic Safety Disclosures, Study Reveals Amid Rising Autonomy and Risk

Related Links

Command Palette

AI Agents Lack Basic Safety Disclosures, Study Reveals Amid Rising Autonomy and Risk

Related Links

Paper Roundup | Latest Advances in Large-Scale Reinforcement Learning: Microsoft, Google, Stanford, Renmin University, Xiaohongshu, and Others Release Major Achievements in Credit Allocation, Complex Reasoning, and Agent Reinforcement Learning

Paper Roundup | Latest Advances in Large-Scale Reinforcement Learning: Microsoft, Google, Stanford, Renmin University, Xiaohongshu, and Others Release Major Achievements in Credit Allocation, Complex Reasoning, and Agent Reinforcement Learning