HyperAI

Since 2024, Anthropic’s performance optimization team has used a take-home technical test to evaluate job applicants, aiming to assess real-world problem-solving and coding skills. However, as AI coding tools like Claude have advanced, the test has repeatedly needed to be redesigned to prevent candidates from relying solely on AI to generate answers. Team lead Tristan Hume detailed the evolving challenge in a blog post on Wednesday. “Each new Claude model has forced us to redesign the test,” he wrote. “When given the same time limit, Claude Opus 4 outperformed most human applicants. That still allowed us to distinguish the strongest candidates — but then, Claude Opus 4.5 matched even those.” The test is intentionally designed to be open to AI assistance, reflecting the real-world environment where developers use tools like AI to boost productivity. But this flexibility has created a growing problem: if the AI model can produce results that are indistinguishable from or even better than top human performance, the test loses its ability to identify truly exceptional talent. “The issue of AI use on assessments is already a major concern in education,” Hume noted, “so it’s ironic that AI labs are now facing the same challenge. But Anthropic is uniquely positioned to address it — because we’re building the tools that are making the problem possible.” To counter this, Hume developed a new version of the test that shifts focus away from traditional optimization tasks, such as fine-tuning code for hardware efficiency. Instead, the updated challenge emphasizes novel, open-ended problem-solving that requires creative thinking and contextual understanding — areas where current AI models still struggle. Despite these changes, Hume invited the public to try their hand at the original test, asking, “If you can best Opus 4.5, we’d love to hear from you.” The move underscores both the difficulty of the task and Anthropic’s confidence in its ability to adapt to the fast-moving AI landscape. The company has clarified that AI tools are permitted during the test, a point previously misstated in an earlier version of this article. TechCrunch regrets the error and confirms that the use of AI is explicitly allowed and intended as part of the evaluation process.

Related Links

Related Links

Related Links

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

Command Palette

Anthropic Revises Coding Test as Claude Outperforms Human Applicants, Challenges AI-Assisted Hiring

Related Links

Command Palette

Anthropic Revises Coding Test as Claude Outperforms Human Applicants, Challenges AI-Assisted Hiring

Related Links

Command Palette

Anthropic Revises Coding Test as Claude Outperforms Human Applicants, Challenges AI-Assisted Hiring

Related Links

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.