China’s Zhipu Launches GLM-5, a 744B-Parameter Frontier AI Model on Domestic Hardware, Beating Global Benchmarks and Marking a Milestone in Sovereign AI Development
On February 11, 2026, Zhipu AI officially launched GLM-5, its latest frontier large language model, just days before the Lunar New Year. The release marks a major milestone for China’s AI industry and positions GLM-5 as the top open-weight model on Artificial Analysis and #1 among open models on LMArena’s Text Arena, scoring 1452—ranked #11 overall. It achieved 77.8% on SWE-bench Verified, 92.7% on AIME 2026, 86.0% on GPQA-Diamond, and leads in benchmarks like BrowseComp, Vending Bench 2, and MCP-Atlas. GLM-5 is a 744B-parameter Mixture-of-Experts model with 40B active parameters per token—roughly double the scale of GLM-4.5. It was trained on 28.5T tokens, up from 23T in its predecessor. The model uses DeepSeek Sparse Attention for efficient long-context processing, supporting a 200K-token context window. It is released under the MIT license on Hugging Face, available via the Zhipu AI API, and already accessible through OpenRouter. Notably, GLM-5 was trained entirely on Huawei Ascend chips using the MindSpore framework, with no reliance on NVIDIA hardware. This is a significant achievement given that Zhipu has been on the U.S. Entity List since January 2025, restricting access to advanced GPUs like the H100 and H200. The successful deployment of a frontier model under these constraints underscores the growing maturity of China’s domestic AI infrastructure. On benchmarks, GLM-5 outperforms Gemini 3 Pro (76.2%) and GPT-5.2 (75.4%) on SWE-bench Verified, though it still trails Claude Opus 4.5 (80.9%). Internally, it achieved a 98% success rate on frontend builds and 74.8% end-to-end correctness on CC-Bench-V2—up 26% from GLM-4.7—indicating progress toward agentic engineering over “vibe coding.” The Vending Bench 2 result, where the model managed a simulated business over a full year, highlights improved long-horizon planning. A standout feature is its performance on the AA-Omniscience Index, where GLM-5 scored -1—35 points better than its predecessor. This reflects a strong ability to recognize when it doesn’t know something, reducing hallucinations, a key concern for production use. The launch follows a stealth release of “Pony Alpha” on OpenRouter on February 6, which quickly gained traction with 40 billion tokens processed in a single day. The model’s behavior, prompt responses, and timing strongly pointed to GLM-5, suggesting a strategic beta test before the official launch. This approach allows real-world feedback without the pressure of public hype. Pricing is set at $1.00 per million input tokens and $3.20 per million output tokens—about 5x cheaper on input and nearly 8x cheaper on output than Claude Opus 4.6. Despite this, it remains expensive compared to earlier models and other Chinese MoEs. Zhipu also raised prices on its GLM Coding Plan by 30% to meet demand, sending its Hong Kong-listed stock up 34% on launch day. With 744B parameters, GLM-5 is primarily an API model. Running it requires at least 8 H200 or H20 GPUs for FP8 inference, making local deployment impractical for most. While some speculate about running it on M4 Ultra Macs with 512GB memory, it remains a high-end, infrastructure-heavy solution. GLM-5 arrives at a pivotal moment. Zhipu is now the world’s first publicly traded foundation model company, having raised $558 million in a Hong Kong IPO at a $7.1 billion valuation—unlike private giants such as OpenAI and Anthropic. The model’s adoption of DeepSeek Sparse Attention and training techniques signals DeepSeek’s continued leadership in model architecture. The use of diverse domestic chips from Moore Threads, Cambricon, and Kunlunxin confirms that China’s AI stack is no longer just theoretical but production-ready. However, GLM-5 is text-only, lacking native multimodal capabilities—something Kimi K2.5 from Moonshot AI already offers. Early users also report weaker situational awareness compared to Claude, with mixed results on “vibe test” evaluations. Some question the transparency of benchmarking methods, highlighting the need for independent verification. In summary, GLM-5 is the most capable open-weight model for coding and agentic tasks to date, built entirely on Chinese hardware under U.S. sanctions. It represents a major leap in both technical capability and strategic autonomy for China’s AI ecosystem.
