HyperAI

Anthropic has unveiled Claude Sonnet 4.5, positioning it as the world’s most advanced AI coding model and a major leap in applied artificial intelligence. Released just four months after Sonnet 4, the new model demonstrates significant improvements in software engineering performance, autonomy, and real-world usability. In the SWE-Bench Verified benchmark—widely regarded as a rigorous test of actual coding ability—Sonnet 4.5 achieved a score of 77.2%, rising to 82% with parallel test-time compute, surpassing OpenAI’s GPT-5 Codex (74.5%) and Google’s Gemini 2.5 Pro (67.2%). The model also excels in real-world task execution. On OSWorld, a test of desktop-level automation, Sonnet 4.5 scored 61.4%, up from 42.2% for Sonnet 4. In Terminal-Bench, which evaluates command-line operations, it reached 50%—outperforming GPT-5’s 43.8%. Perhaps most notably, Sonnet 4.5 can operate autonomously for over 30 hours, quadrupling the endurance of Anthropic’s previous flagship model, Opus 4.1. Early tests show it successfully built a full Slack-like application, including code generation, database setup, domain registration, and security audits—completing around 11,000 lines of code. Anthropic emphasizes that Sonnet 4.5 is not just a faster or smarter model, but one capable of delivering production-ready outcomes. It enhances code reliability, refactoring judgment, and readiness for deployment—key differentiators in enterprise software development. The model’s performance has driven strong business momentum: Claude Code, the product built around these capabilities, now generates over $500 million in run-rate revenue, with usage growing more than tenfold in three months. To support developers, Anthropic has launched a suite of new tools. The Claude Agent SDK provides a full-stack framework for building context-aware, multi-step AI agents, addressing challenges like memory management, task coordination, and user authorization. A new version of Claude Code includes a native VS Code extension, enhanced terminal workflows, and a critical “checkpoints” feature that allows users to roll back to prior states if AI-generated code goes off track—improving reliability during complex development tasks. Despite these advances, security remains a concern. Shortly after launch, AI researcher Pliny the Liberator claimed to bypass Sonnet 4.5’s safety guards and generate sensitive content. Anthropic acknowledges the challenge, noting that while the model is the “most aligned” to date—showing reduced tendencies toward sycophancy, deception, and power-seeking—its safeguards can still trigger false positives, particularly on technical or sensitive topics. The company reports a tenfold reduction in false alarms but admits the issue persists. Pricing remains competitive: $3 per million input tokens and $15 per million output tokens—lower than the premium Opus model but higher than OpenAI’s GPT-5. This strategy reflects Anthropic’s shift from pure model-as-a-service to a broader platform approach, integrating tools, agents, and developer infrastructure. With Sonnet 4.5, Anthropic is not just chasing performance benchmarks—it’s building a complete ecosystem for AI-powered development. As rivals like Google and OpenAI prepare new releases, the race is no longer just about raw model power, but about delivering reliable, safe, and practical tools that developers can trust in real workflows. In this regard, Anthropic has taken a decisive step forward.

Related Links

Related Links

Related Links

Beyond Visual Reality: Tsinghua WorldArena's New Evaluation System Reveals the Capability Gap in Embodied World Models

Beyond Visual Reality: Tsinghua WorldArena's New Evaluation System Reveals the Capability Gap in Embodied World Models

Command Palette

Anthropic Launches Claude Sonnet 4.5 with 30-Hour Coding Focus for Advanced AI Agents

Related Links

Command Palette

Anthropic Launches Claude Sonnet 4.5 with 30-Hour Coding Focus for Advanced AI Agents

Related Links

Command Palette

Anthropic Launches Claude Sonnet 4.5 with 30-Hour Coding Focus for Advanced AI Agents

Related Links

Beyond Visual Reality: Tsinghua WorldArena's New Evaluation System Reveals the Capability Gap in Embodied World Models

Beyond Visual Reality: Tsinghua WorldArena's New Evaluation System Reveals the Capability Gap in Embodied World Models